2015-05-05 05:22:12

by Al Viro

[permalink] [raw]
Subject: [RFC][PATCHSET] non-recursive pathname resolution

My apologies for the patchbomb from hell, but I really think
it needs a public review.

There are several things in that queue; most of it is fs/namei.c
rewrite and closer to the end we get
* non-recursive link_path_walk()
* the only symlink-related limitation is that there should be no
more than 40 symlinks traversals, no matter how nested they are.
* stack footprint independent from the nesting and about on par with
mainline in "no symlinks" case.
* some work towards the symlink traversal without dropping from
RCU mode; not complete, but much better starting point than the current
mainline.
* IMO more readable logics around the pathname resolution in
general, but then I'd spent the last couple of weeks messing with that code,
so my perception might be seriously warped.

Most of that stuff is mine, with exception of 3 commits from
Neil's series (with some modifications). I've marked those with [N] in
the list below.

It really needs review and more testing; it *does* pass xfstests,
LTP and local tests with no regressions, but at the very least it'll need
careful profiling, as well as more eyes to read it through. The whole
thing is based at -rc1. Individual patches are in followups; those who
prefer to access it via git can fetch it at
git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git link_path_walk

Please, read and comment.

Roadmap:

* assorted preliminary patches:

[1/79] 9p: don't bother with 4K allocation for 24-byte local array...
[2/79] 9p: don't bother with __getname() in ->follow_link()
[3/79][N] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
[4/79] ext4: split inode_operations for encrypted symlinks off the rest

* getting rid of boilerplate for inline symlinks

[5/79] libfs: simple_follow_link()
[6/79] ext2: use simple_follow_link()
[7/79] befs: switch to simple_follow_link()
[8/79] ext3: switch to simple_follow_link()
[9/79] ext4: switch to simple_follow_link()
[10/79] jffs2: switch to simple_follow_link()
[11/79] shmem: switch to simple_follow_link()
[12/79] debugfs: switch to simple_follow_link()
[13/79] ufs: switch to simple_follow_link()
[14/79] ubifs: switch to simple_follow_link()
[15/79] sysv: switch to simple_follow_link()
[16/79] jfs: switch to simple_follow_link()
[17/79] freevxfs: switch to simple_follow_link()
[18/79] exofs: switch to {simple,page}_symlink_inode_operations
[19/79] ceph: switch to simple_follow_link()

* more assorted preparations:
[20/79] logfs: fix a pagecache leak for symlinks
[21/79][N] SECURITY: remove nameidata arg from inode_follow_link.
[22/79] uninline walk_component()

* getting trailing symlinks handling in do_last() into saner shape:
[23/79] namei: take O_NOFOLLOW treatment into do_last()
[24/79] do_last: kill symlink_ok
[25/79] do_last: regularize the logics around following symlinks

* getting rid of nameidata on stack during ->rmdir()/->unlink() and double
nameidata during ->rename():
[26/79] namei: get rid of lookup_hash()
[27/79] name: shift nameidata down into user_path_walk()

* single instance of nameidata across the adjacent path_mountpoint() calls:
[28/79] namei: lift nameidata into filename_mountpoint()

At that point preparations end. The next one changes the calling conventions
for ->follow_link/->put_link. We still are passing nameidata to
->follow_link(), but almost all instances do not use it anymore. Moreover,
nd->saved_names[] is gone and so's nameidata in generic_readlink().
That one has the biggest footprint outside of fs/namei.c - it's a flagday
change.

[29/79] new ->follow_link() and ->put_link() calling conventions

* some preparations for link_path_walk() massage:
[30/79] namei.c: separate the parts of follow_link() that find the link body
[31/79] namei: don't bother with ->follow_link() if ->i_link is set
[32/79] namei: introduce nameidata->link
[33/79] do_last: move path there from caller's stack frame

* link_path_walk()/follow_link() rewrite. It's done in small steps,
getting the damn thing into form that would allow to get rid of recursion:

[34/79] namei: expand nested_symlink() in its only caller
[35/79] namei: expand the call of follow_link() in link_path_walk()
[36/79] namei: move the calls of may_follow_link() into follow_link()
[37/79] namei: rename follow_link to trailing_symlink, move it down
[38/79] link_path_walk: handle get_link() returning ERR_PTR() immediately
[39/79] link_path_walk: don't bother with walk_component() after jumping link
[40/79] link_path_walk: turn inner loop into explicit goto
[41/79] link_path_walk: massage a bit more
[42/79] link_path_walk: get rid of duplication
[43/79] link_path_walk: final preparations to killing recursion

* dumb and staightforward "put all variables we needed to be separate in
each frame into an array of structs" killing of recursion. Array is still
on stack frame of link_path_walk(), we are still with the same depth limits,
etc. All of that will change shortly afterwards.
[44/79] link_path_walk: kill the recursion

* post-operation cleanup:
[45/79] link_path_walk: split "return from recursive call" path
[46/79] link_path_walk: cleanup - turn goto start; into continue;

* array moves into nameidata, with extra (0th) element used instead of
link/cookie pairs of local variables we used to have in trailing symlink
loops:
[47/79] namei: move link/cookie pairs into nameidata

* more post-op cleanup - arguments of several functions are always the
fields of the same array element or pointers to such:
[48/79] namei: trim redundant arguments of trailing_symlink()
[49/79] namei: trim redundant arguments of fs/namei.c:put_link()
[50/79] namei: trim the arguments of get_link()

* trim the array in nameidata to a couple of elements, use it until we
need more and allocate a big (41-element) array when needed. Result:
no more nesting limitations.
[51/79] namei: remove restrictions on nesting depth

* the use of array is seriously counterintuitive - for everything except
the last component we are _not_ using the first element of array; moreover,
way too many places are manipulating nd->depth. I've tried to sanitize
all of that in one patch, but after several failures I ended up splitting
the massage into smaller steps. The composition of those has ended up
fairly small, but convincing yourself that it's correct will mean reporoducing
this splitup ;-/
[52/79] link_path_walk: nd->depth massage, part 1
[53/79] link_path_walk: nd->depth massage, part 2
[54/79] link_path_walk: nd->depth massage, part 3
[55/79] link_path_walk: nd->depth massage, part 4
[56/79] trailing_symlink: nd->depth massage, part 5
[57/79] get_link: nd->depth massage, part 6
[58/79] trailing_symlink: nd->depth massage, part 7
[59/79] put_link: nd->depth massage, part 8
[60/79] link_path_walk: nd->depth massage, part 9
[61/79] link_path_walk: nd->depth massage, part 10
[62/79] link_path_walk: end of nd->depth massage
[63/79] namei: we never need more than MAXSYMLINKS entries in nd->stack

* getting more regular rules for calling terminate_walk()/put_link().
The reason for that part is (besides getting simpler logics in the end) that
with RCU-symlink series we'll really need to have terminate_walk() trigger
put_link() on everything we have on stack:

[64/79] namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into
callers
[65/79] lift terminate_walk() into callers of walk_component()
[66/79] namei: lift (open-coded) terminate_walk() into callers of get_link()
[67/79] namei: take put_link() into {lookup,mountpoint,do}_last()
[68/79] namei: have terminate_walk() do put_link() on everything left
[69/79] link_path_walk: move the OK: inside the loop

* more put_link() work - basically, we are consolidating the "we decide
that we'd found a symlink that needs to be followed" logics. Again,
that's one of the areas RCU-symlink will need to modify:

[70/79] namei: new calling conventions for walk_component()
[71/79] namei: make should_follow_link() store the link in nd->link
[72/79] namei: move link count check and stack allocation into pick_link()

At that point the parts of the core work I consider reasonably stable
are over. Next 4 commits are about not passing nameidata to ->follow_link() -
it's the same situation as with pt_regs and IRQ handlers and the same kind
of solution. Only 2 instances out of 26 use nameidata at all, and only
to pass it to nd_jump_link(); seeing that we already have the "begin
using this nameidata instance/end using it" pairs bracketing the areas
where thread works with given instance (we need that for sane freeing
of on-demand allocated nd->stack), the solution is obvious...

[73/79] lustre: rip the private symlink nesting limit out
[74/79][N] VFS: replace {, total_}link_count in task_struct with pointer to
nameidata
[75/79] namei: simplify the callers of follow_managed()
[76/79] don't pass nameidata to ->follow_link()

The rest is very preliminary bits and pieces for RCU-symlink; it's almost
certainly going to be replaced and can be pretty much ignored for now:

[77/79] namei: move unlazying on symlinks towards get_link() call sites
[78/79] namei: new helper - is_stale()
[79/79] namei: stretch RCU mode into get_link()


2015-05-05 05:24:02

by Al Viro

[permalink] [raw]
Subject: [PATCH 01/79] 9p: don't bother with 4K allocation for 24-byte local array...

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/9p/vfs_inode.c | 26 +++++---------------------
1 file changed, 5 insertions(+), 21 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 703342e..cda68f7 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1370,6 +1370,8 @@ v9fs_vfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
return v9fs_vfs_mkspecial(dir, dentry, P9_DMSYMLINK, symname);
}

+#define U32_MAX_DIGITS 10
+
/**
* v9fs_vfs_link - create a hardlink
* @old_dentry: dentry for file to link to
@@ -1383,7 +1385,7 @@ v9fs_vfs_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
int retval;
- char *name;
+ char name[1 + U32_MAX_DIGITS + 2]; /* sign + number + \n + \0 */
struct p9_fid *oldfid;

p9_debug(P9_DEBUG_VFS, " %lu,%pd,%pd\n",
@@ -1393,20 +1395,12 @@ v9fs_vfs_link(struct dentry *old_dentry, struct inode *dir,
if (IS_ERR(oldfid))
return PTR_ERR(oldfid);

- name = __getname();
- if (unlikely(!name)) {
- retval = -ENOMEM;
- goto clunk_fid;
- }
-
sprintf(name, "%d\n", oldfid->fid);
retval = v9fs_vfs_mkspecial(dir, dentry, P9_DMLINK, name);
- __putname(name);
if (!retval) {
v9fs_refresh_inode(oldfid, d_inode(old_dentry));
v9fs_invalidate_inode_attr(dir);
}
-clunk_fid:
p9_client_clunk(oldfid);
return retval;
}
@@ -1425,7 +1419,7 @@ v9fs_vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rde
{
struct v9fs_session_info *v9ses = v9fs_inode2v9ses(dir);
int retval;
- char *name;
+ char name[2 + U32_MAX_DIGITS + 1 + U32_MAX_DIGITS + 1];
u32 perm;

p9_debug(P9_DEBUG_VFS, " %lu,%pd mode: %hx MAJOR: %u MINOR: %u\n",
@@ -1435,26 +1429,16 @@ v9fs_vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rde
if (!new_valid_dev(rdev))
return -EINVAL;

- name = __getname();
- if (!name)
- return -ENOMEM;
/* build extension */
if (S_ISBLK(mode))
sprintf(name, "b %u %u", MAJOR(rdev), MINOR(rdev));
else if (S_ISCHR(mode))
sprintf(name, "c %u %u", MAJOR(rdev), MINOR(rdev));
- else if (S_ISFIFO(mode))
- *name = 0;
- else if (S_ISSOCK(mode))
+ else
*name = 0;
- else {
- __putname(name);
- return -EINVAL;
- }

perm = unixmode2p9mode(v9ses, mode);
retval = v9fs_vfs_mkspecial(dir, dentry, perm, name);
- __putname(name);

return retval;
}
--
2.1.4

2015-05-05 05:29:47

by Al Viro

[permalink] [raw]
Subject: [PATCH 02/79] 9p: don't bother with __getname() in ->follow_link()

From: Al Viro <[email protected]>

We copy there a kmalloc'ed string and proceed to kfree that string immediately
after that. Easier to just feed that string to nd_set_link() and _not_
kfree it until ->put_link() (which becomes kfree_put_link() in that case).

Signed-off-by: Al Viro <[email protected]>
---
fs/9p/v9fs.h | 2 --
fs/9p/vfs_inode.c | 93 ++++++++++----------------------------------------
fs/9p/vfs_inode_dotl.c | 31 +++++------------
3 files changed, 26 insertions(+), 100 deletions(-)

diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index fb9ffcb..0923f2c 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -149,8 +149,6 @@ extern int v9fs_vfs_unlink(struct inode *i, struct dentry *d);
extern int v9fs_vfs_rmdir(struct inode *i, struct dentry *d);
extern int v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
-extern void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd,
- void *p);
extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
struct p9_fid *fid,
struct super_block *sb, int new);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index cda68f7..0ba1171 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1224,103 +1224,46 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
}

/**
- * v9fs_readlink - read a symlink's location (internal version)
+ * v9fs_vfs_follow_link - follow a symlink path
* @dentry: dentry for symlink
- * @buffer: buffer to load symlink location into
- * @buflen: length of buffer
+ * @nd: nameidata
*
*/

-static int v9fs_readlink(struct dentry *dentry, char *buffer, int buflen)
+static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
{
- int retval;
-
- struct v9fs_session_info *v9ses;
- struct p9_fid *fid;
+ struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+ struct p9_fid *fid = v9fs_fid_lookup(dentry);
struct p9_wstat *st;

- p9_debug(P9_DEBUG_VFS, " %pd\n", dentry);
- retval = -EPERM;
- v9ses = v9fs_dentry2v9ses(dentry);
- fid = v9fs_fid_lookup(dentry);
+ p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
+
if (IS_ERR(fid))
- return PTR_ERR(fid);
+ return ERR_CAST(fid);

if (!v9fs_proto_dotu(v9ses))
- return -EBADF;
+ return ERR_PTR(-EBADF);

st = p9_client_stat(fid);
if (IS_ERR(st))
- return PTR_ERR(st);
+ return ERR_CAST(st);

if (!(st->mode & P9_DMSYMLINK)) {
- retval = -EINVAL;
- goto done;
+ p9stat_free(st);
+ kfree(st);
+ return ERR_PTR(-EINVAL);
}
+ if (strlen(st->extension) >= PATH_MAX)
+ st->extension[PATH_MAX - 1] = '\0';

- /* copy extension buffer into buffer */
- retval = min(strlen(st->extension)+1, (size_t)buflen);
- memcpy(buffer, st->extension, retval);
-
- p9_debug(P9_DEBUG_VFS, "%pd -> %s (%.*s)\n",
- dentry, st->extension, buflen, buffer);
-
-done:
+ nd_set_link(nd, st->extension);
+ st->extension = NULL;
p9stat_free(st);
kfree(st);
- return retval;
-}
-
-/**
- * v9fs_vfs_follow_link - follow a symlink path
- * @dentry: dentry for symlink
- * @nd: nameidata
- *
- */
-
-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- int len = 0;
- char *link = __getname();
-
- p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
-
- if (!link)
- link = ERR_PTR(-ENOMEM);
- else {
- len = v9fs_readlink(dentry, link, PATH_MAX);
-
- if (len < 0) {
- __putname(link);
- link = ERR_PTR(len);
- } else
- link[min(len, PATH_MAX-1)] = 0;
- }
- nd_set_link(nd, link);
-
return NULL;
}

/**
- * v9fs_vfs_put_link - release a symlink path
- * @dentry: dentry for symlink
- * @nd: nameidata
- * @p: unused
- *
- */
-
-void
-v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
-{
- char *s = nd_get_link(nd);
-
- p9_debug(P9_DEBUG_VFS, " %pd %s\n",
- dentry, IS_ERR(s) ? "<error>" : s);
- if (!IS_ERR(s))
- __putname(s);
-}
-
-/**
* v9fs_vfs_mkspecial - create a special file
* @dir: inode to create special file in
* @dentry: dentry to create
@@ -1514,7 +1457,7 @@ static const struct inode_operations v9fs_file_inode_operations = {
static const struct inode_operations v9fs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = v9fs_vfs_follow_link,
- .put_link = v9fs_vfs_put_link,
+ .put_link = kfree_put_link,
.getattr = v9fs_vfs_getattr,
.setattr = v9fs_vfs_setattr,
};
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 9861c7c..bc2a91f 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -912,33 +912,18 @@ error:
static void *
v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
{
- int retval;
- struct p9_fid *fid;
- char *link = __getname();
+ struct p9_fid *fid = v9fs_fid_lookup(dentry);
char *target;
+ int retval;

p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);

- if (!link) {
- link = ERR_PTR(-ENOMEM);
- goto ndset;
- }
- fid = v9fs_fid_lookup(dentry);
- if (IS_ERR(fid)) {
- __putname(link);
- link = ERR_CAST(fid);
- goto ndset;
- }
+ if (IS_ERR(fid))
+ return ERR_CAST(fid);
retval = p9_client_readlink(fid, &target);
- if (!retval) {
- strcpy(link, target);
- kfree(target);
- goto ndset;
- }
- __putname(link);
- link = ERR_PTR(retval);
-ndset:
- nd_set_link(nd, link);
+ if (retval)
+ return ERR_PTR(retval);
+ nd_set_link(nd, target);
return NULL;
}

@@ -1006,7 +991,7 @@ const struct inode_operations v9fs_file_inode_operations_dotl = {
const struct inode_operations v9fs_symlink_inode_operations_dotl = {
.readlink = generic_readlink,
.follow_link = v9fs_vfs_follow_link_dotl,
- .put_link = v9fs_vfs_put_link,
+ .put_link = kfree_put_link,
.getattr = v9fs_vfs_getattr_dotl,
.setattr = v9fs_vfs_setattr_dotl,
.setxattr = generic_setxattr,
--
2.1.4

2015-05-05 05:23:18

by Al Viro

[permalink] [raw]
Subject: [PATCH 03/79] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link

From: NeilBrown <[email protected]>

ovl_follow_link current calls ->put_link on an error path.
However ->put_link is about to change in a way that it will be
impossible to call it from ovl_follow_link.

So rearrange the code to avoid the need for that error path.
Specifically: move the kmalloc() call before the ->follow_link()
call to the subordinate filesystem.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/overlayfs/inode.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 04f1248..1b4b9c5e 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -145,6 +145,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
void *ret;
struct dentry *realdentry;
struct inode *realinode;
+ struct ovl_link_data *data = NULL;

realdentry = ovl_dentry_real(dentry);
realinode = realdentry->d_inode;
@@ -152,25 +153,23 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
if (WARN_ON(!realinode->i_op->follow_link))
return ERR_PTR(-EPERM);

- ret = realinode->i_op->follow_link(realdentry, nd);
- if (IS_ERR(ret))
- return ret;
-
if (realinode->i_op->put_link) {
- struct ovl_link_data *data;
-
data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
- if (!data) {
- realinode->i_op->put_link(realdentry, nd, ret);
+ if (!data)
return ERR_PTR(-ENOMEM);
- }
data->realdentry = realdentry;
- data->cookie = ret;
+ }

- return data;
- } else {
- return NULL;
+ ret = realinode->i_op->follow_link(realdentry, nd);
+ if (IS_ERR(ret)) {
+ kfree(data);
+ return ret;
}
+
+ if (data)
+ data->cookie = ret;
+
+ return data;
}

static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
--
2.1.4

2015-05-05 05:23:06

by Al Viro

[permalink] [raw]
Subject: [PATCH 04/79] ext4: split inode_operations for encrypted symlinks off the rest

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/inode.c | 10 ++++++++--
fs/ext4/namei.c | 9 +++++----
fs/ext4/symlink.c | 30 ++++++++++--------------------
4 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ef267ad..2640913 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2849,6 +2849,7 @@ extern int ext4_mpage_readpages(struct address_space *mapping,
unsigned nr_pages);

/* symlink.c */
+extern const struct inode_operations ext4_encrypted_symlink_inode_operations;
extern const struct inode_operations ext4_symlink_inode_operations;
extern const struct inode_operations ext4_fast_symlink_inode_operations;

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbd0654..a320558 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4211,8 +4211,14 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &ext4_dir_inode_operations;
inode->i_fop = &ext4_dir_operations;
} else if (S_ISLNK(inode->i_mode)) {
- if (ext4_inode_is_fast_symlink(inode) &&
- !ext4_encrypted_inode(inode)) {
+ if (ext4_encrypted_inode(inode)) {
+#ifdef CONFIG_EXT4_FS_ENCRYPTION
+ inode->i_op = &ext4_encrypted_symlink_inode_operations;
+ ext4_set_aops(inode);
+#else
+ BUILD_BUG();
+#endif
+ } else if (ext4_inode_is_fast_symlink(inode)) {
inode->i_op = &ext4_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 7223b0b..84b1920 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3266,10 +3266,12 @@ static int ext4_symlink(struct inode *dir,
goto err_drop_inode;
sd->len = cpu_to_le16(ostr.len);
disk_link.name = (char *) sd;
+ inode->i_op = &ext4_encrypted_symlink_inode_operations;
}

if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
- inode->i_op = &ext4_symlink_inode_operations;
+ if (!encryption_required)
+ inode->i_op = &ext4_symlink_inode_operations;
ext4_set_aops(inode);
/*
* We cannot call page_symlink() with transaction started
@@ -3309,9 +3311,8 @@ static int ext4_symlink(struct inode *dir,
} else {
/* clear the extent format for fast symlink */
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
- inode->i_op = encryption_required ?
- &ext4_symlink_inode_operations :
- &ext4_fast_symlink_inode_operations;
+ if (!encryption_required)
+ inode->i_op = &ext4_fast_symlink_inode_operations;
memcpy((char *)&EXT4_I(inode)->i_data, disk_link.name,
disk_link.len);
inode->i_size = disk_link.len - 1;
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 19f78f2..4ea5a01 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -35,9 +35,6 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
int res;
u32 plen, max_size = inode->i_sb->s_blocksize;

- if (!ext4_encrypted_inode(inode))
- return page_follow_link_light(dentry, nd);
-
ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize);
if (IS_ERR(ctx))
return ctx;
@@ -97,18 +94,16 @@ errout:
return ERR_PTR(res);
}

-static void ext4_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
-{
- struct page *page = cookie;
-
- if (!page) {
- kfree(nd_get_link(nd));
- } else {
- kunmap(page);
- page_cache_release(page);
- }
-}
+const struct inode_operations ext4_encrypted_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = ext4_follow_link,
+ .put_link = kfree_put_link,
+ .setattr = ext4_setattr,
+ .setxattr = generic_setxattr,
+ .getxattr = generic_getxattr,
+ .listxattr = ext4_listxattr,
+ .removexattr = generic_removexattr,
+};
#endif

static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
@@ -120,13 +115,8 @@ static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)

const struct inode_operations ext4_symlink_inode_operations = {
.readlink = generic_readlink,
-#ifdef CONFIG_EXT4_FS_ENCRYPTION
- .follow_link = ext4_follow_link,
- .put_link = ext4_put_link,
-#else
.follow_link = page_follow_link_light,
.put_link = page_put_link,
-#endif
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
--
2.1.4

2015-05-05 05:23:14

by Al Viro

[permalink] [raw]
Subject: [PATCH 05/79] libfs: simple_follow_link()

From: Al Viro <[email protected]>

let "fast" symlinks store the pointer to the body into ->i_link and
use simple_follow_link for ->follow_link()

Signed-off-by: Al Viro <[email protected]>
---
fs/inode.c | 1 +
fs/libfs.c | 13 +++++++++++++
include/linux/fs.h | 3 +++
3 files changed, 17 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index ea37cd1..952fb48 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -152,6 +152,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
+ inode->i_link = NULL;
inode->i_rdev = 0;
inode->dirtied_when = 0;

diff --git a/fs/libfs.c b/fs/libfs.c
index cb1fb4b..72e4e01 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1093,3 +1093,16 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
return -EINVAL;
}
EXPORT_SYMBOL(simple_nosetlease);
+
+void *simple_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+ nd_set_link(nd, d_inode(dentry)->i_link);
+ return NULL;
+}
+EXPORT_SYMBOL(simple_follow_link);
+
+const struct inode_operations simple_symlink_inode_operations = {
+ .follow_link = simple_follow_link,
+ .readlink = generic_readlink
+};
+EXPORT_SYMBOL(simple_symlink_inode_operations);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35ec87e..0ac758f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -656,6 +656,7 @@ struct inode {
struct pipe_inode_info *i_pipe;
struct block_device *i_bdev;
struct cdev *i_cdev;
+ char *i_link;
};

__u32 i_generation;
@@ -2721,6 +2722,8 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);
+void *simple_follow_link(struct dentry *, struct nameidata *);
+extern const struct inode_operations simple_symlink_inode_operations;

extern int iterate_dir(struct file *, struct dir_context *);

--
2.1.4

2015-05-05 05:45:31

by Al Viro

[permalink] [raw]
Subject: [PATCH 06/79] ext2: use simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ext2/inode.c | 1 +
fs/ext2/namei.c | 3 ++-
fs/ext2/symlink.c | 10 +---------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f460ae3..5c09776 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1403,6 +1403,7 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
inode->i_mapping->a_ops = &ext2_aops;
} else if (S_ISLNK(inode->i_mode)) {
if (ext2_inode_is_fast_symlink(inode)) {
+ inode->i_link = (char *)ei->i_data;
inode->i_op = &ext2_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 3e074a9..13ec54a 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -189,7 +189,8 @@ static int ext2_symlink (struct inode * dir, struct dentry * dentry,
} else {
/* fast symlink */
inode->i_op = &ext2_fast_symlink_inode_operations;
- memcpy((char*)(EXT2_I(inode)->i_data),symname,l);
+ inode->i_link = (char*)EXT2_I(inode)->i_data;
+ memcpy(inode->i_link, symname, l);
inode->i_size = l-1;
}
mark_inode_dirty(inode);
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 20608f1..ae17179 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -19,14 +19,6 @@

#include "ext2.h"
#include "xattr.h"
-#include <linux/namei.h>
-
-static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ext2_inode_info *ei = EXT2_I(d_inode(dentry));
- nd_set_link(nd, (char *)ei->i_data);
- return NULL;
-}

const struct inode_operations ext2_symlink_inode_operations = {
.readlink = generic_readlink,
@@ -43,7 +35,7 @@ const struct inode_operations ext2_symlink_inode_operations = {

const struct inode_operations ext2_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ext2_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ext2_setattr,
#ifdef CONFIG_EXT2_FS_XATTR
.setxattr = generic_setxattr,
--
2.1.4

2015-05-05 05:23:52

by Al Viro

[permalink] [raw]
Subject: [PATCH 07/79] befs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/befs/linuxvfs.c | 24 +++++-------------------
1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 7943533..172e306 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -43,7 +43,6 @@ static struct inode *befs_alloc_inode(struct super_block *sb);
static void befs_destroy_inode(struct inode *inode);
static void befs_destroy_inodecache(void);
static void *befs_follow_link(struct dentry *, struct nameidata *);
-static void *befs_fast_follow_link(struct dentry *, struct nameidata *);
static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -80,11 +79,6 @@ static const struct address_space_operations befs_aops = {
.bmap = befs_bmap,
};

-static const struct inode_operations befs_fast_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = befs_fast_follow_link,
-};
-
static const struct inode_operations befs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = befs_follow_link,
@@ -403,10 +397,12 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &befs_dir_inode_operations;
inode->i_fop = &befs_dir_operations;
} else if (S_ISLNK(inode->i_mode)) {
- if (befs_ino->i_flags & BEFS_LONG_SYMLINK)
+ if (befs_ino->i_flags & BEFS_LONG_SYMLINK) {
inode->i_op = &befs_symlink_inode_operations;
- else
- inode->i_op = &befs_fast_symlink_inode_operations;
+ } else {
+ inode->i_link = befs_ino->i_data.symlink;
+ inode->i_op = &simple_symlink_inode_operations;
+ }
} else {
befs_error(sb, "Inode %lu is not a regular file, "
"directory or symlink. THAT IS WRONG! BeFS has no "
@@ -497,16 +493,6 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
return NULL;
}

-
-static void *
-befs_fast_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
-
- nd_set_link(nd, befs_ino->i_data.symlink);
- return NULL;
-}
-
/*
* UTF-8 to NLS charset convert routine
*
--
2.1.4

2015-05-05 05:45:15

by Al Viro

[permalink] [raw]
Subject: [PATCH 08/79] ext3: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ext3/inode.c | 1 +
fs/ext3/namei.c | 3 ++-
fs/ext3/symlink.c | 10 +---------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 2ee2dc4..6c7e546 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2999,6 +2999,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &ext3_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
+ inode->i_link = (char *)ei->i_data;
} else {
inode->i_op = &ext3_symlink_inode_operations;
ext3_set_aops(inode);
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 4264b9b..c9e767c 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -2308,7 +2308,8 @@ retry:
}
} else {
inode->i_op = &ext3_fast_symlink_inode_operations;
- memcpy((char*)&EXT3_I(inode)->i_data,symname,l);
+ inode->i_link = (char*)&EXT3_I(inode)->i_data;
+ memcpy(inode->i_link, symname, l);
inode->i_size = l-1;
}
EXT3_I(inode)->i_disksize = inode->i_size;
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index ea96df3..c08c590 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -17,17 +17,9 @@
* ext3 symlink handling code
*/

-#include <linux/namei.h>
#include "ext3.h"
#include "xattr.h"

-static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ext3_inode_info *ei = EXT3_I(d_inode(dentry));
- nd_set_link(nd, (char*)ei->i_data);
- return NULL;
-}
-
const struct inode_operations ext3_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
@@ -43,7 +35,7 @@ const struct inode_operations ext3_symlink_inode_operations = {

const struct inode_operations ext3_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ext3_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ext3_setattr,
#ifdef CONFIG_EXT3_FS_XATTR
.setxattr = generic_setxattr,
--
2.1.4

2015-05-05 05:23:40

by Al Viro

[permalink] [raw]
Subject: [PATCH 09/79] ext4: switch to simple_follow_link()

From: Al Viro <[email protected]>

for fast symlinks only, of course...

Signed-off-by: Al Viro <[email protected]>
---
fs/ext4/inode.c | 1 +
fs/ext4/namei.c | 4 +++-
fs/ext4/symlink.c | 9 +--------
3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a320558..aba562b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4219,6 +4219,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
BUILD_BUG();
#endif
} else if (ext4_inode_is_fast_symlink(inode)) {
+ inode->i_link = (char *)ei->i_data;
inode->i_op = &ext4_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 84b1920..99017eb 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3311,8 +3311,10 @@ static int ext4_symlink(struct inode *dir,
} else {
/* clear the extent format for fast symlink */
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
- if (!encryption_required)
+ if (!encryption_required) {
inode->i_op = &ext4_fast_symlink_inode_operations;
+ inode->i_link = (char *)&EXT4_I(inode)->i_data;
+ }
memcpy((char *)&EXT4_I(inode)->i_data, disk_link.name,
disk_link.len);
inode->i_size = disk_link.len - 1;
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 4ea5a01..1720a13 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -106,13 +106,6 @@ const struct inode_operations ext4_encrypted_symlink_inode_operations = {
};
#endif

-static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ext4_inode_info *ei = EXT4_I(d_inode(dentry));
- nd_set_link(nd, (char *) ei->i_data);
- return NULL;
-}
-
const struct inode_operations ext4_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
@@ -126,7 +119,7 @@ const struct inode_operations ext4_symlink_inode_operations = {

const struct inode_operations ext4_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ext4_follow_fast_link,
+ .follow_link = simple_follow_link,
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
--
2.1.4

2015-05-05 05:30:08

by Al Viro

[permalink] [raw]
Subject: [PATCH 10/79] jffs2: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/jffs2/dir.c | 1 +
fs/jffs2/fs.c | 1 +
fs/jffs2/symlink.c | 45 +--------------------------------------------
3 files changed, 3 insertions(+), 44 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 1ba5c97..8118002 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -354,6 +354,7 @@ static int jffs2_symlink (struct inode *dir_i, struct dentry *dentry, const char
ret = -ENOMEM;
goto fail;
}
+ inode->i_link = f->target;

jffs2_dbg(1, "%s(): symlink's target '%s' cached\n",
__func__, (char *)f->target);
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index fe5ea08..60d86e8 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -294,6 +294,7 @@ struct inode *jffs2_iget(struct super_block *sb, unsigned long ino)

case S_IFLNK:
inode->i_op = &jffs2_symlink_inode_operations;
+ inode->i_link = f->target;
break;

case S_IFDIR:
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index 1fefa25..8ce2f24 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -9,58 +9,15 @@
*
*/

-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/kernel.h>
-#include <linux/fs.h>
-#include <linux/namei.h>
#include "nodelist.h"

-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd);
-
const struct inode_operations jffs2_symlink_inode_operations =
{
.readlink = generic_readlink,
- .follow_link = jffs2_follow_link,
+ .follow_link = simple_follow_link,
.setattr = jffs2_setattr,
.setxattr = jffs2_setxattr,
.getxattr = jffs2_getxattr,
.listxattr = jffs2_listxattr,
.removexattr = jffs2_removexattr
};
-
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct jffs2_inode_info *f = JFFS2_INODE_INFO(d_inode(dentry));
- char *p = (char *)f->target;
-
- /*
- * We don't acquire the f->sem mutex here since the only data we
- * use is f->target.
- *
- * 1. If we are here the inode has already built and f->target has
- * to point to the target path.
- * 2. Nobody uses f->target (if the inode is symlink's inode). The
- * exception is inode freeing function which frees f->target. But
- * it can't be called while we are here and before VFS has
- * stopped using our f->target string which we provide by means of
- * nd_set_link() call.
- */
-
- if (!p) {
- pr_err("%s(): can't find symlink target\n", __func__);
- p = ERR_PTR(-EIO);
- }
- jffs2_dbg(1, "%s(): target path is '%s'\n",
- __func__, (char *)f->target);
-
- nd_set_link(nd, p);
-
- /*
- * We will unlock the f->sem mutex but VFS will use the f->target string. This is safe
- * since the only way that may cause f->target to be changed is iput() operation.
- * But VFS will not use f->target after iput() has been called.
- */
- return NULL;
-}
-
--
2.1.4

2015-05-05 05:29:58

by Al Viro

[permalink] [raw]
Subject: [PATCH 11/79] shmem: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
mm/shmem.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index de98137..7f6e2f8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2451,6 +2451,7 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return -ENOMEM;
}
inode->i_op = &shmem_short_symlink_operations;
+ inode->i_link = info->symlink;
} else {
error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
if (error) {
@@ -2474,12 +2475,6 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return 0;
}

-static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, SHMEM_I(d_inode(dentry))->symlink);
- return NULL;
-}
-
static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct page *page = NULL;
@@ -2642,7 +2637,7 @@ static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)

static const struct inode_operations shmem_short_symlink_operations = {
.readlink = generic_readlink,
- .follow_link = shmem_follow_short_symlink,
+ .follow_link = simple_follow_link,
#ifdef CONFIG_TMPFS_XATTR
.setxattr = shmem_setxattr,
.getxattr = shmem_getxattr,
--
2.1.4

2015-05-05 05:24:05

by Al Viro

[permalink] [raw]
Subject: [PATCH 12/79] debugfs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/debugfs/file.c | 12 ------------
fs/debugfs/inode.c | 6 +++---
include/linux/debugfs.h | 1 -
3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 830a7e7..284f9aa 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -17,7 +17,6 @@
#include <linux/fs.h>
#include <linux/seq_file.h>
#include <linux/pagemap.h>
-#include <linux/namei.h>
#include <linux/debugfs.h>
#include <linux/io.h>
#include <linux/slab.h>
@@ -43,17 +42,6 @@ const struct file_operations debugfs_file_operations = {
.llseek = noop_llseek,
};

-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, d_inode(dentry)->i_private);
- return NULL;
-}
-
-const struct inode_operations debugfs_link_operations = {
- .readlink = generic_readlink,
- .follow_link = debugfs_follow_link,
-};
-
static int debugfs_u8_set(void *data, u64 val)
{
*(u8 *)data = val;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index c1e7ffb..7eaec88 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -174,7 +174,7 @@ static void debugfs_evict_inode(struct inode *inode)
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
if (S_ISLNK(inode->i_mode))
- kfree(inode->i_private);
+ kfree(inode->i_link);
}

static const struct super_operations debugfs_super_operations = {
@@ -511,8 +511,8 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
return failed_creating(dentry);
}
inode->i_mode = S_IFLNK | S_IRWXUGO;
- inode->i_op = &debugfs_link_operations;
- inode->i_private = link;
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = link;
d_instantiate(dentry, inode);
return end_creating(dentry);
}
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index cb25af4..420311b 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -45,7 +45,6 @@ extern struct dentry *arch_debugfs_dir;

/* declared over in file.c */
extern const struct file_operations debugfs_file_operations;
-extern const struct inode_operations debugfs_link_operations;

struct dentry *debugfs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
--
2.1.4

2015-05-05 05:45:21

by Al Viro

[permalink] [raw]
Subject: [PATCH 13/79] ufs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ufs/inode.c | 5 +++--
fs/ufs/namei.c | 3 ++-
fs/ufs/symlink.c | 13 +------------
3 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index be7d42c..99aaf5c 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -572,9 +572,10 @@ static void ufs_set_inode_ops(struct inode *inode)
inode->i_fop = &ufs_dir_operations;
inode->i_mapping->a_ops = &ufs_aops;
} else if (S_ISLNK(inode->i_mode)) {
- if (!inode->i_blocks)
+ if (!inode->i_blocks) {
inode->i_op = &ufs_fast_symlink_inode_operations;
- else {
+ inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;
+ } else {
inode->i_op = &ufs_symlink_inode_operations;
inode->i_mapping->a_ops = &ufs_aops;
}
diff --git a/fs/ufs/namei.c b/fs/ufs/namei.c
index e491a93..f773deb 100644
--- a/fs/ufs/namei.c
+++ b/fs/ufs/namei.c
@@ -144,7 +144,8 @@ static int ufs_symlink (struct inode * dir, struct dentry * dentry,
} else {
/* fast symlink */
inode->i_op = &ufs_fast_symlink_inode_operations;
- memcpy(UFS_I(inode)->i_u1.i_symlink, symname, l);
+ inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;
+ memcpy(inode->i_link, symname, l);
inode->i_size = l-1;
}
mark_inode_dirty(inode);
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index 5b537e2..874480b 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -25,23 +25,12 @@
* ext2 symlink handling code
*/

-#include <linux/fs.h>
-#include <linux/namei.h>
-
#include "ufs_fs.h"
#include "ufs.h"

-
-static void *ufs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ufs_inode_info *p = UFS_I(d_inode(dentry));
- nd_set_link(nd, (char*)p->i_u1.i_symlink);
- return NULL;
-}
-
const struct inode_operations ufs_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ufs_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ufs_setattr,
};

--
2.1.4

2015-05-05 05:44:56

by Al Viro

[permalink] [raw]
Subject: [PATCH 14/79] ubifs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ubifs/dir.c | 1 +
fs/ubifs/file.c | 11 +----------
fs/ubifs/super.c | 1 +
3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 27060fc..5c27c66 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -889,6 +889,7 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry,

memcpy(ui->data, symname, len);
((char *)ui->data)[len] = '\0';
+ inode->i_link = ui->data;
/*
* The terminating zero byte is not written to the flash media and it
* is put just to make later in-memory string processing simpler. Thus,
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 35efc10..a3dfe2a 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -51,7 +51,6 @@

#include "ubifs.h"
#include <linux/mount.h>
-#include <linux/namei.h>
#include <linux/slab.h>

static int read_block(struct inode *inode, void *addr, unsigned int block,
@@ -1300,14 +1299,6 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
ClearPageChecked(page);
}

-static void *ubifs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ubifs_inode *ui = ubifs_inode(d_inode(dentry));
-
- nd_set_link(nd, ui->data);
- return NULL;
-}
-
int ubifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
struct inode *inode = file->f_mapping->host;
@@ -1570,7 +1561,7 @@ const struct inode_operations ubifs_file_inode_operations = {

const struct inode_operations ubifs_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ubifs_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ubifs_setattr,
.getattr = ubifs_getattr,
.setxattr = ubifs_setxattr,
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 75e6f04..20f5dbd 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -195,6 +195,7 @@ struct inode *ubifs_iget(struct super_block *sb, unsigned long inum)
}
memcpy(ui->data, ino->data, ui->data_len);
((char *)ui->data)[ui->data_len] = '\0';
+ inode->i_link = ui->data;
break;
case S_IFBLK:
case S_IFCHR:
--
2.1.4

2015-05-05 05:29:35

by Al Viro

[permalink] [raw]
Subject: [PATCH 15/79] sysv: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/sysv/Makefile | 2 +-
fs/sysv/inode.c | 5 +++--
fs/sysv/symlink.c | 20 --------------------
fs/sysv/sysv.h | 1 -
4 files changed, 4 insertions(+), 24 deletions(-)
delete mode 100644 fs/sysv/symlink.c

diff --git a/fs/sysv/Makefile b/fs/sysv/Makefile
index 3591f9d..1926224 100644
--- a/fs/sysv/Makefile
+++ b/fs/sysv/Makefile
@@ -5,4 +5,4 @@
obj-$(CONFIG_SYSV_FS) += sysv.o

sysv-objs := ialloc.o balloc.o inode.o itree.o file.o dir.o \
- namei.o super.o symlink.o
+ namei.o super.o
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index 8895630..590ad92 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -166,8 +166,9 @@ void sysv_set_inode(struct inode *inode, dev_t rdev)
inode->i_op = &sysv_symlink_inode_operations;
inode->i_mapping->a_ops = &sysv_aops;
} else {
- inode->i_op = &sysv_fast_symlink_inode_operations;
- nd_terminate_link(SYSV_I(inode)->i_data, inode->i_size,
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = (char *)SYSV_I(inode)->i_data;
+ nd_terminate_link(inode->i_link, inode->i_size,
sizeof(SYSV_I(inode)->i_data) - 1);
}
} else
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
deleted file mode 100644
index d3fa0d7..0000000
--- a/fs/sysv/symlink.c
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- * linux/fs/sysv/symlink.c
- *
- * Handling of System V filesystem fast symlinks extensions.
- * Aug 2001, Christoph Hellwig ([email protected])
- */
-
-#include "sysv.h"
-#include <linux/namei.h>
-
-static void *sysv_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, (char *)SYSV_I(d_inode(dentry))->i_data);
- return NULL;
-}
-
-const struct inode_operations sysv_fast_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = sysv_follow_link,
-};
diff --git a/fs/sysv/sysv.h b/fs/sysv/sysv.h
index 69d4889..2c13525 100644
--- a/fs/sysv/sysv.h
+++ b/fs/sysv/sysv.h
@@ -161,7 +161,6 @@ extern ino_t sysv_inode_by_name(struct dentry *);

extern const struct inode_operations sysv_file_inode_operations;
extern const struct inode_operations sysv_dir_inode_operations;
-extern const struct inode_operations sysv_fast_symlink_inode_operations;
extern const struct file_operations sysv_file_operations;
extern const struct file_operations sysv_dir_operations;
extern const struct address_space_operations sysv_aops;
--
2.1.4

2015-05-05 05:45:08

by Al Viro

[permalink] [raw]
Subject: [PATCH 16/79] jfs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/jfs/inode.c | 3 ++-
fs/jfs/namei.c | 5 ++---
fs/jfs/symlink.c | 10 +---------
3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 070dc4b..6f1cb2b 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -63,11 +63,12 @@ struct inode *jfs_iget(struct super_block *sb, unsigned long ino)
inode->i_mapping->a_ops = &jfs_aops;
} else {
inode->i_op = &jfs_fast_symlink_inode_operations;
+ inode->i_link = JFS_IP(inode)->i_inline;
/*
* The inline data should be null-terminated, but
* don't let on-disk corruption crash the kernel
*/
- JFS_IP(inode)->i_inline[inode->i_size] = '\0';
+ inode->i_link[inode->i_size] = '\0';
}
} else {
inode->i_op = &jfs_file_inode_operations;
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index 66db7bc..e33be92 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -880,7 +880,6 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
int ssize; /* source pathname size */
struct btstack btstack;
struct inode *ip = d_inode(dentry);
- unchar *i_fastsymlink;
s64 xlen = 0;
int bmask = 0, xsize;
s64 xaddr;
@@ -946,8 +945,8 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
if (ssize <= IDATASIZE) {
ip->i_op = &jfs_fast_symlink_inode_operations;

- i_fastsymlink = JFS_IP(ip)->i_inline;
- memcpy(i_fastsymlink, name, ssize);
+ ip->i_link = JFS_IP(ip)->i_inline;
+ memcpy(ip->i_link, name, ssize);
ip->i_size = ssize - 1;

/*
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index 80f42bc..5929e23 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -17,21 +17,13 @@
*/

#include <linux/fs.h>
-#include <linux/namei.h>
#include "jfs_incore.h"
#include "jfs_inode.h"
#include "jfs_xattr.h"

-static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- char *s = JFS_IP(d_inode(dentry))->i_inline;
- nd_set_link(nd, s);
- return NULL;
-}
-
const struct inode_operations jfs_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = jfs_follow_link,
+ .follow_link = simple_follow_link,
.setattr = jfs_setattr,
.setxattr = jfs_setxattr,
.getxattr = jfs_getxattr,
--
2.1.4

2015-05-05 05:29:22

by Al Viro

[permalink] [raw]
Subject: [PATCH 17/79] freevxfs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/freevxfs/vxfs_extern.h | 3 ---
fs/freevxfs/vxfs_immed.c | 34 ----------------------------------
fs/freevxfs/vxfs_inode.c | 7 +++++--
3 files changed, 5 insertions(+), 39 deletions(-)

diff --git a/fs/freevxfs/vxfs_extern.h b/fs/freevxfs/vxfs_extern.h
index 881aa3d..e3dcb44 100644
--- a/fs/freevxfs/vxfs_extern.h
+++ b/fs/freevxfs/vxfs_extern.h
@@ -50,9 +50,6 @@ extern daddr_t vxfs_bmap1(struct inode *, long);
/* vxfs_fshead.c */
extern int vxfs_read_fshead(struct super_block *);

-/* vxfs_immed.c */
-extern const struct inode_operations vxfs_immed_symlink_iops;
-
/* vxfs_inode.c */
extern const struct address_space_operations vxfs_immed_aops;
extern struct kmem_cache *vxfs_inode_cachep;
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index 8b9229e..cb84f0f 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -32,29 +32,15 @@
*/
#include <linux/fs.h>
#include <linux/pagemap.h>
-#include <linux/namei.h>

#include "vxfs.h"
#include "vxfs_extern.h"
#include "vxfs_inode.h"


-static void * vxfs_immed_follow_link(struct dentry *, struct nameidata *);
-
static int vxfs_immed_readpage(struct file *, struct page *);

/*
- * Inode operations for immed symlinks.
- *
- * Unliked all other operations we do not go through the pagecache,
- * but do all work directly on the inode.
- */
-const struct inode_operations vxfs_immed_symlink_iops = {
- .readlink = generic_readlink,
- .follow_link = vxfs_immed_follow_link,
-};
-
-/*
* Address space operations for immed files and directories.
*/
const struct address_space_operations vxfs_immed_aops = {
@@ -62,26 +48,6 @@ const struct address_space_operations vxfs_immed_aops = {
};

/**
- * vxfs_immed_follow_link - follow immed symlink
- * @dp: dentry for the link
- * @np: pathname lookup data for the current path walk
- *
- * Description:
- * vxfs_immed_follow_link restarts the pathname lookup with
- * the data obtained from @dp.
- *
- * Returns:
- * Zero on success, else a negative error code.
- */
-static void *
-vxfs_immed_follow_link(struct dentry *dp, struct nameidata *np)
-{
- struct vxfs_inode_info *vip = VXFS_INO(d_inode(dp));
- nd_set_link(np, vip->vii_immed.vi_immed);
- return NULL;
-}
-
-/**
* vxfs_immed_readpage - read part of an immed inode into pagecache
* @file: file context (unused)
* @page: page frame to fill in.
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 363e3ae..ef73ed6 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -35,6 +35,7 @@
#include <linux/pagemap.h>
#include <linux/kernel.h>
#include <linux/slab.h>
+#include <linux/namei.h>

#include "vxfs.h"
#include "vxfs_inode.h"
@@ -327,8 +328,10 @@ vxfs_iget(struct super_block *sbp, ino_t ino)
ip->i_op = &page_symlink_inode_operations;
ip->i_mapping->a_ops = &vxfs_aops;
} else {
- ip->i_op = &vxfs_immed_symlink_iops;
- vip->vii_immed.vi_immed[ip->i_size] = '\0';
+ ip->i_op = &simple_symlink_inode_operations;
+ ip->i_link = vip->vii_immed.vi_immed;
+ nd_terminate_link(ip->i_link, ip->i_size,
+ sizeof(vip->vii_immed.vi_immed) - 1);
}
} else
init_special_inode(ip, ip->i_mode, old_decode_dev(vip->vii_rdev));
--
2.1.4

2015-05-05 05:29:19

by Al Viro

[permalink] [raw]
Subject: [PATCH 18/79] exofs: switch to {simple,page}_symlink_inode_operations

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/exofs/Kbuild | 2 +-
fs/exofs/exofs.h | 4 ----
fs/exofs/inode.c | 9 +++++----
fs/exofs/namei.c | 5 +++--
fs/exofs/symlink.c | 55 ------------------------------------------------------
5 files changed, 9 insertions(+), 66 deletions(-)
delete mode 100644 fs/exofs/symlink.c

diff --git a/fs/exofs/Kbuild b/fs/exofs/Kbuild
index b47c7b8..a364fd0 100644
--- a/fs/exofs/Kbuild
+++ b/fs/exofs/Kbuild
@@ -16,5 +16,5 @@
libore-y := ore.o ore_raid.o
obj-$(CONFIG_ORE) += libore.o

-exofs-y := inode.o file.o symlink.o namei.o dir.o super.o sys.o
+exofs-y := inode.o file.o namei.o dir.o super.o sys.o
obj-$(CONFIG_EXOFS_FS) += exofs.o
diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index ad9cac6..2e86086 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -207,10 +207,6 @@ extern const struct address_space_operations exofs_aops;
extern const struct inode_operations exofs_dir_inode_operations;
extern const struct inode_operations exofs_special_inode_operations;

-/* symlink.c */
-extern const struct inode_operations exofs_symlink_inode_operations;
-extern const struct inode_operations exofs_fast_symlink_inode_operations;
-
/* exofs_init_comps will initialize an ore_components device array
* pointing to a single ore_comp struct, and a round-robin view
* of the device table.
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 786e4cc..73c64da 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -1222,10 +1222,11 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
inode->i_fop = &exofs_dir_operations;
inode->i_mapping->a_ops = &exofs_aops;
} else if (S_ISLNK(inode->i_mode)) {
- if (exofs_inode_is_fast_symlink(inode))
- inode->i_op = &exofs_fast_symlink_inode_operations;
- else {
- inode->i_op = &exofs_symlink_inode_operations;
+ if (exofs_inode_is_fast_symlink(inode)) {
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = (char *)oi->i_data;
+ } else {
+ inode->i_op = &page_symlink_inode_operations;
inode->i_mapping->a_ops = &exofs_aops;
}
} else {
diff --git a/fs/exofs/namei.c b/fs/exofs/namei.c
index 5ae25e4..09a6bb1 100644
--- a/fs/exofs/namei.c
+++ b/fs/exofs/namei.c
@@ -113,7 +113,7 @@ static int exofs_symlink(struct inode *dir, struct dentry *dentry,
oi = exofs_i(inode);
if (l > sizeof(oi->i_data)) {
/* slow symlink */
- inode->i_op = &exofs_symlink_inode_operations;
+ inode->i_op = &page_symlink_inode_operations;
inode->i_mapping->a_ops = &exofs_aops;
memset(oi->i_data, 0, sizeof(oi->i_data));

@@ -122,7 +122,8 @@ static int exofs_symlink(struct inode *dir, struct dentry *dentry,
goto out_fail;
} else {
/* fast symlink */
- inode->i_op = &exofs_fast_symlink_inode_operations;
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = (char *)oi->i_data;
memcpy(oi->i_data, symname, l);
inode->i_size = l-1;
}
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
deleted file mode 100644
index 6f6f3a4..0000000
--- a/fs/exofs/symlink.c
+++ /dev/null
@@ -1,55 +0,0 @@
-/*
- * Copyright (C) 2005, 2006
- * Avishay Traeger ([email protected])
- * Copyright (C) 2008, 2009
- * Boaz Harrosh <[email protected]>
- *
- * Copyrights for code taken from ext2:
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card ([email protected])
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- * from
- * linux/fs/minix/inode.c
- * Copyright (C) 1991, 1992 Linus Torvalds
- *
- * This file is part of exofs.
- *
- * exofs is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation. Since it is based on ext2, and the only
- * valid version of GPL for the Linux kernel is version 2, the only valid
- * version of GPL for exofs is version 2.
- *
- * exofs is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with exofs; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include <linux/namei.h>
-
-#include "exofs.h"
-
-static void *exofs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct exofs_i_info *oi = exofs_i(d_inode(dentry));
-
- nd_set_link(nd, (char *)oi->i_data);
- return NULL;
-}
-
-const struct inode_operations exofs_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = page_follow_link_light,
- .put_link = page_put_link,
-};
-
-const struct inode_operations exofs_fast_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = exofs_follow_link,
-};
--
2.1.4

2015-05-05 05:28:25

by Al Viro

[permalink] [raw]
Subject: [PATCH 19/79] ceph: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ceph/inode.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e876e19..571acd8 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -6,7 +6,6 @@
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/kernel.h>
-#include <linux/namei.h>
#include <linux/writeback.h>
#include <linux/vmalloc.h>
#include <linux/posix_acl.h>
@@ -819,6 +818,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
else
kfree(sym); /* lost a race */
}
+ inode->i_link = ci->i_symlink;
break;
case S_IFDIR:
inode->i_op = &ceph_dir_iops;
@@ -1691,16 +1691,9 @@ retry:
/*
* symlinks
*/
-static void *ceph_sym_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ceph_inode_info *ci = ceph_inode(d_inode(dentry));
- nd_set_link(nd, ci->i_symlink);
- return NULL;
-}
-
static const struct inode_operations ceph_symlink_iops = {
.readlink = generic_readlink,
- .follow_link = ceph_sym_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ceph_setattr,
.getattr = ceph_getattr,
.setxattr = ceph_setxattr,
--
2.1.4

2015-05-05 05:44:19

by Al Viro

[permalink] [raw]
Subject: [PATCH 20/79] logfs: fix a pagecache leak for symlinks

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/logfs/dir.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/logfs/dir.c b/fs/logfs/dir.c
index 4cf38f1..f9b45d4 100644
--- a/fs/logfs/dir.c
+++ b/fs/logfs/dir.c
@@ -779,6 +779,7 @@ fail:
const struct inode_operations logfs_symlink_iops = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
+ .put_link = page_put_link,
};

const struct inode_operations logfs_dir_iops = {
--
2.1.4

2015-05-05 05:44:11

by Al Viro

[permalink] [raw]
Subject: [PATCH 21/79] SECURITY: remove nameidata arg from inode_follow_link.

From: NeilBrown <[email protected]>

No ->inode_follow_link() methods use the nameidata arg, and
it is about to become private to namei.c.
So remove from all inode_follow_link() functions.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 +-
include/linux/security.h | 9 +++------
security/capability.c | 3 +--
security/security.c | 4 ++--
security/selinux/hooks.c | 2 +-
5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4a8d998b..cd0ffc9f18 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -871,7 +871,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
touch_atime(link);
nd_set_link(nd, NULL);

- error = security_inode_follow_link(link->dentry, nd);
+ error = security_inode_follow_link(dentry);
if (error)
goto out_put_nd_path;

diff --git a/include/linux/security.h b/include/linux/security.h
index 18264ea..62a6620 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -43,7 +43,6 @@ struct file;
struct vfsmount;
struct path;
struct qstr;
-struct nameidata;
struct iattr;
struct fown_struct;
struct file_operations;
@@ -477,7 +476,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* @inode_follow_link:
* Check permission to follow a symbolic link when looking up a pathname.
* @dentry contains the dentry structure for the link.
- * @nd contains the nameidata structure for the parent directory.
* Return 0 if permission is granted.
* @inode_permission:
* Check permission before accessing an inode. This hook is called by the
@@ -1553,7 +1551,7 @@ struct security_operations {
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
int (*inode_readlink) (struct dentry *dentry);
- int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
+ int (*inode_follow_link) (struct dentry *dentry);
int (*inode_permission) (struct inode *inode, int mask);
int (*inode_setattr) (struct dentry *dentry, struct iattr *attr);
int (*inode_getattr) (const struct path *path);
@@ -1839,7 +1837,7 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags);
int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
+int security_inode_follow_link(struct dentry *dentry);
int security_inode_permission(struct inode *inode, int mask);
int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
int security_inode_getattr(const struct path *path);
@@ -2241,8 +2239,7 @@ static inline int security_inode_readlink(struct dentry *dentry)
return 0;
}

-static inline int security_inode_follow_link(struct dentry *dentry,
- struct nameidata *nd)
+static inline int security_inode_follow_link(struct dentry *dentry)
{
return 0;
}
diff --git a/security/capability.c b/security/capability.c
index 0d03fcc..fae99db 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -209,8 +209,7 @@ static int cap_inode_readlink(struct dentry *dentry)
return 0;
}

-static int cap_inode_follow_link(struct dentry *dentry,
- struct nameidata *nameidata)
+static int cap_inode_follow_link(struct dentry *dentry)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index 8e9b1f4..d7c30b0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,11 @@ int security_inode_readlink(struct dentry *dentry)
return security_ops->inode_readlink(dentry);
}

-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd)
+int security_inode_follow_link(struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
- return security_ops->inode_follow_link(dentry, nd);
+ return security_ops->inode_follow_link(dentry);
}

int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 7dade28..8016229 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2861,7 +2861,7 @@ static int selinux_inode_readlink(struct dentry *dentry)
return dentry_has_perm(cred, dentry, FILE__READ);
}

-static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *nameidata)
+static int selinux_inode_follow_link(struct dentry *dentry)
{
const struct cred *cred = current_cred();

--
2.1.4

2015-05-05 05:28:16

by Al Viro

[permalink] [raw]
Subject: [PATCH 22/79] uninline walk_component()

From: Al Viro <[email protected]>

seriously improves the stack *and* I-cache footprint...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cd0ffc9f18..955dc7f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1561,8 +1561,7 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
return unlikely(d_is_symlink(dentry)) ? follow : 0;
}

-static inline int walk_component(struct nameidata *nd, struct path *path,
- int follow)
+static int walk_component(struct nameidata *nd, struct path *path, int follow)
{
struct inode *inode;
int err;
--
2.1.4

2015-05-05 05:27:45

by Al Viro

[permalink] [raw]
Subject: [PATCH 23/79] namei: take O_NOFOLLOW treatment into do_last()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 955dc7f..308f325 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3052,6 +3052,11 @@ finish_lookup:
}
}
BUG_ON(inode != path->dentry->d_inode);
+ if (!(nd->flags & LOOKUP_FOLLOW)) {
+ path_put_conditional(path, nd);
+ path_put(&nd->path);
+ return -ELOOP;
+ }
return 1;
}

@@ -3236,12 +3241,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
while (unlikely(error > 0)) { /* trailing symlink */
struct path link = path;
void *cookie;
- if (!(nd->flags & LOOKUP_FOLLOW)) {
- path_put_conditional(&path, nd);
- path_put(&nd->path);
- error = -ELOOP;
- break;
- }
error = may_follow_link(&link, nd);
if (unlikely(error))
break;
--
2.1.4

2015-05-05 05:42:25

by Al Viro

[permalink] [raw]
Subject: [PATCH 24/79] do_last: kill symlink_ok

From: Al Viro <[email protected]>

When O_PATH is present, O_CREAT isn't, so symlink_ok is always equal to
(open_flags & O_PATH) && !(nd->flags & LOOKUP_FOLLOW).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 308f325..58293a8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2923,7 +2923,6 @@ static int do_last(struct nameidata *nd, struct path *path,
bool got_write = false;
int acc_mode = op->acc_mode;
struct inode *inode;
- bool symlink_ok = false;
struct path save_parent = { .dentry = NULL, .mnt = NULL };
bool retried = false;
int error;
@@ -2941,8 +2940,6 @@ static int do_last(struct nameidata *nd, struct path *path,
if (!(open_flag & O_CREAT)) {
if (nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
- if (open_flag & O_PATH && !(nd->flags & LOOKUP_FOLLOW))
- symlink_ok = true;
/* we _can_ be in RCU mode here */
error = lookup_fast(nd, path, &inode);
if (likely(!error))
@@ -3043,7 +3040,8 @@ finish_lookup:
goto out;
}

- if (should_follow_link(path->dentry, !symlink_ok)) {
+ if (should_follow_link(path->dentry,
+ !(open_flag & O_PATH) || (nd->flags & LOOKUP_FOLLOW))) {
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path->mnt ||
unlazy_walk(nd, path->dentry))) {
--
2.1.4

2015-05-05 05:42:09

by Al Viro

[permalink] [raw]
Subject: [PATCH 25/79] do_last: regularize the logics around following symlinks

From: Al Viro <[email protected]>

With LOOKUP_FOLLOW we unlazy and return 1; without it we either
fail with ELOOP or, for O_PATH opens, succeed. No need to mix
those cases...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 58293a8..02b7952 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3040,8 +3040,7 @@ finish_lookup:
goto out;
}

- if (should_follow_link(path->dentry,
- !(open_flag & O_PATH) || (nd->flags & LOOKUP_FOLLOW))) {
+ if (should_follow_link(path->dentry, nd->flags & LOOKUP_FOLLOW)) {
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path->mnt ||
unlazy_walk(nd, path->dentry))) {
@@ -3050,14 +3049,15 @@ finish_lookup:
}
}
BUG_ON(inode != path->dentry->d_inode);
- if (!(nd->flags & LOOKUP_FOLLOW)) {
- path_put_conditional(path, nd);
- path_put(&nd->path);
- return -ELOOP;
- }
return 1;
}

+ if (unlikely(d_is_symlink(path->dentry)) && !(open_flag & O_PATH)) {
+ path_to_nameidata(path, nd);
+ error = -ELOOP;
+ goto out;
+ }
+
if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path->mnt) {
path_to_nameidata(path, nd);
} else {
--
2.1.4

2015-05-05 05:42:03

by Al Viro

[permalink] [raw]
Subject: [PATCH 26/79] namei: get rid of lookup_hash()

From: Al Viro <[email protected]>

it's a convenient helper, but we'll want to shift nameidata
down the call chain, so it won't be available there...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 02b7952..1c3803f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2120,16 +2120,6 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
}
EXPORT_SYMBOL(vfs_path_lookup);

-/*
- * Restricted form of lookup. Doesn't follow links, single-component only,
- * needs parent already locked. Doesn't follow mounts.
- * SMP-safe.
- */
-static struct dentry *lookup_hash(struct nameidata *nd)
-{
- return __lookup_hash(&nd->last, nd->path.dentry, nd->flags);
-}
-
/**
* lookup_one_len - filesystem helper to lookup single pathname component
* @name: pathname component to lookup
@@ -3344,7 +3334,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
* Do the final lookup.
*/
mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
+ dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
if (IS_ERR(dentry))
goto unlock;

@@ -3658,7 +3648,7 @@ retry:
goto exit1;

mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
+ dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit2;
@@ -3778,7 +3768,7 @@ retry:
goto exit1;
retry_deleg:
mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
+ dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
error = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
/* Why not before? Because we want correct error value */
@@ -4297,7 +4287,7 @@ retry:
retry_deleg:
trap = lock_rename(new_dir, old_dir);

- old_dentry = lookup_hash(&oldnd);
+ old_dentry = __lookup_hash(&oldnd.last, oldnd.path.dentry, oldnd.flags);
error = PTR_ERR(old_dentry);
if (IS_ERR(old_dentry))
goto exit3;
@@ -4305,7 +4295,7 @@ retry_deleg:
error = -ENOENT;
if (d_is_negative(old_dentry))
goto exit4;
- new_dentry = lookup_hash(&newnd);
+ new_dentry = __lookup_hash(&newnd.last, newnd.path.dentry, newnd.flags);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto exit4;
--
2.1.4

2015-05-05 05:27:24

by Al Viro

[permalink] [raw]
Subject: [PATCH 27/79] name: shift nameidata down into user_path_walk()

From: Al Viro <[email protected]>

that avoids having nameidata on stack during the calls of
->rmdir()/->unlink() and *two* of those during the calls
of ->rename().

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 124 +++++++++++++++++++++++++++++++++----------------------------
1 file changed, 67 insertions(+), 57 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1c3803f..fed19ad 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2203,9 +2203,13 @@ EXPORT_SYMBOL(user_path_at);
* path-walking is complete.
*/
static struct filename *
-user_path_parent(int dfd, const char __user *path, struct nameidata *nd,
+user_path_parent(int dfd, const char __user *path,
+ struct path *parent,
+ struct qstr *last,
+ int *type,
unsigned int flags)
{
+ struct nameidata nd;
struct filename *s = getname(path);
int error;

@@ -2215,11 +2219,14 @@ user_path_parent(int dfd, const char __user *path, struct nameidata *nd,
if (IS_ERR(s))
return s;

- error = filename_lookup(dfd, s, flags | LOOKUP_PARENT, nd);
+ error = filename_lookup(dfd, s, flags | LOOKUP_PARENT, &nd);
if (error) {
putname(s);
return ERR_PTR(error);
}
+ *parent = nd.path;
+ *last = nd.last;
+ *type = nd.last_type;

return s;
}
@@ -3623,14 +3630,17 @@ static long do_rmdir(int dfd, const char __user *pathname)
int error = 0;
struct filename *name;
struct dentry *dentry;
- struct nameidata nd;
+ struct path path;
+ struct qstr last;
+ int type;
unsigned int lookup_flags = 0;
retry:
- name = user_path_parent(dfd, pathname, &nd, lookup_flags);
+ name = user_path_parent(dfd, pathname,
+ &path, &last, &type, lookup_flags);
if (IS_ERR(name))
return PTR_ERR(name);

- switch(nd.last_type) {
+ switch (type) {
case LAST_DOTDOT:
error = -ENOTEMPTY;
goto exit1;
@@ -3642,13 +3652,12 @@ retry:
goto exit1;
}

- nd.flags &= ~LOOKUP_PARENT;
- error = mnt_want_write(nd.path.mnt);
+ error = mnt_want_write(path.mnt);
if (error)
goto exit1;

- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
+ mutex_lock_nested(&path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+ dentry = __lookup_hash(&last, path.dentry, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit2;
@@ -3656,17 +3665,17 @@ retry:
error = -ENOENT;
goto exit3;
}
- error = security_path_rmdir(&nd.path, dentry);
+ error = security_path_rmdir(&path, dentry);
if (error)
goto exit3;
- error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+ error = vfs_rmdir(path.dentry->d_inode, dentry);
exit3:
dput(dentry);
exit2:
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
- mnt_drop_write(nd.path.mnt);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
+ mnt_drop_write(path.mnt);
exit1:
- path_put(&nd.path);
+ path_put(&path);
putname(name);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -3749,43 +3758,45 @@ static long do_unlinkat(int dfd, const char __user *pathname)
int error;
struct filename *name;
struct dentry *dentry;
- struct nameidata nd;
+ struct path path;
+ struct qstr last;
+ int type;
struct inode *inode = NULL;
struct inode *delegated_inode = NULL;
unsigned int lookup_flags = 0;
retry:
- name = user_path_parent(dfd, pathname, &nd, lookup_flags);
+ name = user_path_parent(dfd, pathname,
+ &path, &last, &type, lookup_flags);
if (IS_ERR(name))
return PTR_ERR(name);

error = -EISDIR;
- if (nd.last_type != LAST_NORM)
+ if (type != LAST_NORM)
goto exit1;

- nd.flags &= ~LOOKUP_PARENT;
- error = mnt_want_write(nd.path.mnt);
+ error = mnt_want_write(path.mnt);
if (error)
goto exit1;
retry_deleg:
- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
+ mutex_lock_nested(&path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+ dentry = __lookup_hash(&last, path.dentry, lookup_flags);
error = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
/* Why not before? Because we want correct error value */
- if (nd.last.name[nd.last.len])
+ if (last.name[last.len])
goto slashes;
inode = dentry->d_inode;
if (d_is_negative(dentry))
goto slashes;
ihold(inode);
- error = security_path_unlink(&nd.path, dentry);
+ error = security_path_unlink(&path, dentry);
if (error)
goto exit2;
- error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
+ error = vfs_unlink(path.dentry->d_inode, dentry, &delegated_inode);
exit2:
dput(dentry);
}
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
if (inode)
iput(inode); /* truncate the inode here */
inode = NULL;
@@ -3794,9 +3805,9 @@ exit2:
if (!error)
goto retry_deleg;
}
- mnt_drop_write(nd.path.mnt);
+ mnt_drop_write(path.mnt);
exit1:
- path_put(&nd.path);
+ path_put(&path);
putname(name);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -4226,14 +4237,15 @@ EXPORT_SYMBOL(vfs_rename);
SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname, unsigned int, flags)
{
- struct dentry *old_dir, *new_dir;
struct dentry *old_dentry, *new_dentry;
struct dentry *trap;
- struct nameidata oldnd, newnd;
+ struct path old_path, new_path;
+ struct qstr old_last, new_last;
+ int old_type, new_type;
struct inode *delegated_inode = NULL;
struct filename *from;
struct filename *to;
- unsigned int lookup_flags = 0;
+ unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
bool should_retry = false;
int error;

@@ -4247,47 +4259,45 @@ SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
if ((flags & RENAME_WHITEOUT) && !capable(CAP_MKNOD))
return -EPERM;

+ if (flags & RENAME_EXCHANGE)
+ target_flags = 0;
+
retry:
- from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
+ from = user_path_parent(olddfd, oldname,
+ &old_path, &old_last, &old_type, lookup_flags);
if (IS_ERR(from)) {
error = PTR_ERR(from);
goto exit;
}

- to = user_path_parent(newdfd, newname, &newnd, lookup_flags);
+ to = user_path_parent(newdfd, newname,
+ &new_path, &new_last, &new_type, lookup_flags);
if (IS_ERR(to)) {
error = PTR_ERR(to);
goto exit1;
}

error = -EXDEV;
- if (oldnd.path.mnt != newnd.path.mnt)
+ if (old_path.mnt != new_path.mnt)
goto exit2;

- old_dir = oldnd.path.dentry;
error = -EBUSY;
- if (oldnd.last_type != LAST_NORM)
+ if (old_type != LAST_NORM)
goto exit2;

- new_dir = newnd.path.dentry;
if (flags & RENAME_NOREPLACE)
error = -EEXIST;
- if (newnd.last_type != LAST_NORM)
+ if (new_type != LAST_NORM)
goto exit2;

- error = mnt_want_write(oldnd.path.mnt);
+ error = mnt_want_write(old_path.mnt);
if (error)
goto exit2;

- oldnd.flags &= ~LOOKUP_PARENT;
- newnd.flags &= ~LOOKUP_PARENT;
- if (!(flags & RENAME_EXCHANGE))
- newnd.flags |= LOOKUP_RENAME_TARGET;
-
retry_deleg:
- trap = lock_rename(new_dir, old_dir);
+ trap = lock_rename(new_path.dentry, old_path.dentry);

- old_dentry = __lookup_hash(&oldnd.last, oldnd.path.dentry, oldnd.flags);
+ old_dentry = __lookup_hash(&old_last, old_path.dentry, lookup_flags);
error = PTR_ERR(old_dentry);
if (IS_ERR(old_dentry))
goto exit3;
@@ -4295,7 +4305,7 @@ retry_deleg:
error = -ENOENT;
if (d_is_negative(old_dentry))
goto exit4;
- new_dentry = __lookup_hash(&newnd.last, newnd.path.dentry, newnd.flags);
+ new_dentry = __lookup_hash(&new_last, new_path.dentry, lookup_flags | target_flags);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto exit4;
@@ -4309,16 +4319,16 @@ retry_deleg:

if (!d_is_dir(new_dentry)) {
error = -ENOTDIR;
- if (newnd.last.name[newnd.last.len])
+ if (new_last.name[new_last.len])
goto exit5;
}
}
/* unless the source is a directory trailing slashes give -ENOTDIR */
if (!d_is_dir(old_dentry)) {
error = -ENOTDIR;
- if (oldnd.last.name[oldnd.last.len])
+ if (old_last.name[old_last.len])
goto exit5;
- if (!(flags & RENAME_EXCHANGE) && newnd.last.name[newnd.last.len])
+ if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
goto exit5;
}
/* source should not be ancestor of target */
@@ -4331,32 +4341,32 @@ retry_deleg:
if (new_dentry == trap)
goto exit5;

- error = security_path_rename(&oldnd.path, old_dentry,
- &newnd.path, new_dentry, flags);
+ error = security_path_rename(&old_path, old_dentry,
+ &new_path, new_dentry, flags);
if (error)
goto exit5;
- error = vfs_rename(old_dir->d_inode, old_dentry,
- new_dir->d_inode, new_dentry,
+ error = vfs_rename(old_path.dentry->d_inode, old_dentry,
+ new_path.dentry->d_inode, new_dentry,
&delegated_inode, flags);
exit5:
dput(new_dentry);
exit4:
dput(old_dentry);
exit3:
- unlock_rename(new_dir, old_dir);
+ unlock_rename(new_path.dentry, old_path.dentry);
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
if (!error)
goto retry_deleg;
}
- mnt_drop_write(oldnd.path.mnt);
+ mnt_drop_write(old_path.mnt);
exit2:
if (retry_estale(error, lookup_flags))
should_retry = true;
- path_put(&newnd.path);
+ path_put(&new_path);
putname(to);
exit1:
- path_put(&oldnd.path);
+ path_put(&old_path);
putname(from);
if (should_retry) {
should_retry = false;
--
2.1.4

2015-05-05 05:41:55

by Al Viro

[permalink] [raw]
Subject: [PATCH 28/79] namei: lift nameidata into filename_mountpoint()

From: Al Viro <[email protected]>

when we go for on-demand allocation of saved state in
link_path_walk(), we'll want nameidata to stay around
for all 3 calls of path_mountpoint().

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fed19ad..b9cfdc1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2336,31 +2336,28 @@ out:
*/
static int
path_mountpoint(int dfd, const struct filename *name, struct path *path,
- unsigned int flags)
+ struct nameidata *nd, unsigned int flags)
{
- struct nameidata nd;
- int err;
-
- err = path_init(dfd, name, flags, &nd);
+ int err = path_init(dfd, name, flags, nd);
if (unlikely(err))
goto out;

- err = mountpoint_last(&nd, path);
+ err = mountpoint_last(nd, path);
while (err > 0) {
void *cookie;
struct path link = *path;
- err = may_follow_link(&link, &nd);
+ err = may_follow_link(&link, nd);
if (unlikely(err))
break;
- nd.flags |= LOOKUP_PARENT;
- err = follow_link(&link, &nd, &cookie);
+ nd->flags |= LOOKUP_PARENT;
+ err = follow_link(&link, nd, &cookie);
if (err)
break;
- err = mountpoint_last(&nd, path);
- put_link(&nd, &link, cookie);
+ err = mountpoint_last(nd, path);
+ put_link(nd, &link, cookie);
}
out:
- path_cleanup(&nd);
+ path_cleanup(nd);
return err;
}

@@ -2368,14 +2365,15 @@ static int
filename_mountpoint(int dfd, struct filename *name, struct path *path,
unsigned int flags)
{
+ struct nameidata nd;
int error;
if (IS_ERR(name))
return PTR_ERR(name);
- error = path_mountpoint(dfd, name, path, flags | LOOKUP_RCU);
+ error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
if (unlikely(error == -ECHILD))
- error = path_mountpoint(dfd, name, path, flags);
+ error = path_mountpoint(dfd, name, path, &nd, flags);
if (unlikely(error == -ESTALE))
- error = path_mountpoint(dfd, name, path, flags | LOOKUP_REVAL);
+ error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
if (likely(!error))
audit_inode(name, path->dentry, 0);
putname(name);
--
2.1.4

2015-05-05 05:26:15

by Al Viro

[permalink] [raw]
Subject: [PATCH 29/79] new ->follow_link() and ->put_link() calling conventions

From: Al Viro <[email protected]>

a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.

Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/Locking | 4 +-
Documentation/filesystems/vfs.txt | 4 +-
drivers/staging/lustre/lustre/llite/symlink.c | 11 ++---
fs/9p/vfs_inode.c | 13 +++---
fs/9p/vfs_inode_dotl.c | 7 ++-
fs/autofs4/symlink.c | 5 +-
fs/befs/linuxvfs.c | 35 +++++++-------
fs/cifs/cifsfs.h | 2 +-
fs/cifs/link.c | 28 ++++++------
fs/configfs/symlink.c | 28 +++++-------
fs/ecryptfs/inode.c | 8 ++--
fs/ext4/symlink.c | 9 ++--
fs/f2fs/namei.c | 20 +++-----
fs/fuse/dir.c | 19 ++------
fs/gfs2/inode.c | 10 ++--
fs/hostfs/hostfs_kern.c | 15 +++---
fs/hppfs/hppfs.c | 9 ++--
fs/kernfs/symlink.c | 22 ++++-----
fs/libfs.c | 12 ++---
fs/namei.c | 66 +++++++++------------------
fs/nfs/symlink.c | 19 +++-----
fs/overlayfs/inode.c | 18 ++++----
fs/proc/base.c | 2 +-
fs/proc/inode.c | 9 ++--
fs/proc/namespaces.c | 2 +-
fs/proc/self.c | 24 +++++-----
fs/proc/thread_self.c | 22 ++++-----
fs/xfs/xfs_iops.c | 10 ++--
include/linux/fs.h | 12 ++---
include/linux/namei.h | 2 -
mm/shmem.c | 23 +++++-----
31 files changed, 195 insertions(+), 275 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0a926e2..7fa6c4a 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,8 +50,8 @@ prototypes:
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ void (*put_link) (struct dentry *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5d833b3..1c6b03a 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ void (*put_link) (struct dentry *, void *);
int (*permission) (struct inode *, int);
int (*get_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 3711e67..e488cb3 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
return rc;
}

-static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ll_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct ptlrpc_request *request = NULL;
@@ -140,18 +140,17 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
}
if (rc) {
ptlrpc_req_finished(request);
- request = NULL;
- symname = ERR_PTR(rc);
+ return ERR_PTR(rc);
}

- nd_set_link(nd, symname);
/* symname may contain a pointer to the request message buffer,
* we delay request releasing until ll_put_link then.
*/
- return request;
+ *cookie = request;
+ return symname;
}

-static void ll_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void ll_put_link(struct dentry *dentry, void *cookie)
{
ptlrpc_req_finished(cookie);
}
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0ba1171..7cc70a3 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1230,11 +1230,12 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
*
*/

-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
struct p9_fid *fid = v9fs_fid_lookup(dentry);
struct p9_wstat *st;
+ char *res;

p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);

@@ -1253,14 +1254,14 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
kfree(st);
return ERR_PTR(-EINVAL);
}
- if (strlen(st->extension) >= PATH_MAX)
- st->extension[PATH_MAX - 1] = '\0';
-
- nd_set_link(nd, st->extension);
+ res = st->extension;
st->extension = NULL;
+ if (strlen(res) >= PATH_MAX)
+ res[PATH_MAX - 1] = '\0';
+
p9stat_free(st);
kfree(st);
- return NULL;
+ return *cookie = res;
}

/**
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index bc2a91f..ae062ff 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -909,8 +909,8 @@ error:
*
*/

-static void *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
+static const char *
+v9fs_vfs_follow_link_dotl(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct p9_fid *fid = v9fs_fid_lookup(dentry);
char *target;
@@ -923,8 +923,7 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
retval = p9_client_readlink(fid, &target);
if (retval)
return ERR_PTR(retval);
- nd_set_link(nd, target);
- return NULL;
+ return *cookie = target;
}

int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode)
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index de58cc7..9c6a077 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,14 +12,13 @@

#include "autofs_i.h"

-static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *autofs4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
if (ino && !autofs4_oz_mode(sbi))
ino->last_used = jiffies;
- nd_set_link(nd, d_inode(dentry)->i_private);
- return NULL;
+ return d_inode(dentry)->i_private;
}

const struct inode_operations autofs4_symlink_inode_operations = {
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 172e306..3a1aefb 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,7 +42,7 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
static struct inode *befs_alloc_inode(struct super_block *sb);
static void befs_destroy_inode(struct inode *inode);
static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *, struct nameidata *);
+static const char *befs_follow_link(struct dentry *, void **, struct nameidata *nd);
static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -463,8 +463,8 @@ befs_destroy_inodecache(void)
* The data stream become link name. Unless the LONG_SYMLINK
* flag is set.
*/
-static void *
-befs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *
+befs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct super_block *sb = dentry->d_sb;
struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
@@ -474,23 +474,20 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)

if (len == 0) {
befs_error(sb, "Long symlink with illegal length");
- link = ERR_PTR(-EIO);
- } else {
- befs_debug(sb, "Follow long symlink");
-
- link = kmalloc(len, GFP_NOFS);
- if (!link) {
- link = ERR_PTR(-ENOMEM);
- } else if (befs_read_lsymlink(sb, data, link, len) != len) {
- kfree(link);
- befs_error(sb, "Failed to read entire long symlink");
- link = ERR_PTR(-EIO);
- } else {
- link[len - 1] = '\0';
- }
+ return ERR_PTR(-EIO);
}
- nd_set_link(nd, link);
- return NULL;
+ befs_debug(sb, "Follow long symlink");
+
+ link = kmalloc(len, GFP_NOFS);
+ if (!link)
+ return ERR_PTR(-ENOMEM);
+ if (befs_read_lsymlink(sb, data, link, len) != len) {
+ kfree(link);
+ befs_error(sb, "Failed to read entire long symlink");
+ return ERR_PTR(-EIO);
+ }
+ link[len - 1] = '\0';
+ return *cookie = link;
}

/*
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 252f5c1..61012da 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
#endif

/* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern const char *cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd);
extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
int buflen);
extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 252e672..4a439c2 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -626,8 +626,8 @@ cifs_hl_exit:
return rc;
}

-void *
-cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
+const char *
+cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(direntry);
int rc = -ENOMEM;
@@ -643,16 +643,18 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)

tlink = cifs_sb_tlink(cifs_sb);
if (IS_ERR(tlink)) {
- rc = PTR_ERR(tlink);
- tlink = NULL;
- goto out;
+ free_xid(xid);
+ return ERR_CAST(tlink);
}
tcon = tlink_tcon(tlink);
server = tcon->ses->server;

full_path = build_path_from_dentry(direntry);
- if (!full_path)
- goto out;
+ if (!full_path) {
+ free_xid(xid);
+ cifs_put_tlink(tlink);
+ return ERR_PTR(-ENOMEM);
+ }

cifs_dbg(FYI, "Full path: %s inode = 0x%p\n", full_path, inode);

@@ -670,17 +672,13 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
&target_path, cifs_sb);

kfree(full_path);
-out:
+ free_xid(xid);
+ cifs_put_tlink(tlink);
if (rc != 0) {
kfree(target_path);
- target_path = ERR_PTR(rc);
+ return ERR_PTR(rc);
}
-
- free_xid(xid);
- if (tlink)
- cifs_put_tlink(tlink);
- nd_set_link(nd, target_path);
- return NULL;
+ return *cookie = target_path;
}

int
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cc9f254..fac8e85 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,30 +279,26 @@ static int configfs_getlink(struct dentry *dentry, char * path)

}

-static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *configfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
+ int error;

- if (page) {
- error = configfs_getlink(dentry, (char *)page);
- if (!error) {
- nd_set_link(nd, (char *)page);
- return (void *)page;
- }
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ error = configfs_getlink(dentry, (char *)page);
+ if (!error) {
+ return *cookie = (void *)page;
}

- nd_set_link(nd, ERR_PTR(error));
- return NULL;
+ free_page(page);
+ return ERR_PTR(error);
}

-static void configfs_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+static void configfs_put_link(struct dentry *dentry, void *cookie)
{
- if (cookie) {
- unsigned long page = (unsigned long)cookie;
- free_page(page);
- }
+ free_page((unsigned long)cookie);
}

const struct inode_operations configfs_symlink_inode_operations = {
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index fc850b5..cdb9d6c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -675,18 +675,16 @@ out:
return rc ? ERR_PTR(rc) : buf;
}

-static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ecryptfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
size_t len;
char *buf = ecryptfs_readlink_lower(dentry, &len);
if (IS_ERR(buf))
- goto out;
+ return buf;
fsstack_copy_attr_atime(d_inode(dentry),
d_inode(ecryptfs_dentry_to_lower(dentry)));
buf[len] = '\0';
-out:
- nd_set_link(nd, buf);
- return NULL;
+ return *cookie = buf;
}

/**
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 1720a13..f4c1c73 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,7 +23,7 @@
#include "xattr.h"

#ifdef CONFIG_EXT4_FS_ENCRYPTION
-static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ext4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct page *cpage = NULL;
char *caddr, *paddr = NULL;
@@ -37,7 +37,7 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)

ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize);
if (IS_ERR(ctx))
- return ctx;
+ return ERR_CAST(ctx);

if (ext4_inode_is_fast_symlink(inode)) {
caddr = (char *) EXT4_I(inode)->i_data;
@@ -46,7 +46,7 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
cpage = read_mapping_page(inode->i_mapping, 0, NULL);
if (IS_ERR(cpage)) {
ext4_put_fname_crypto_ctx(&ctx);
- return cpage;
+ return ERR_CAST(cpage);
}
caddr = kmap(cpage);
caddr[size] = 0;
@@ -77,13 +77,12 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
/* Null-terminate the name */
if (res <= plen)
paddr[res] = '\0';
- nd_set_link(nd, paddr);
ext4_put_fname_crypto_ctx(&ctx);
if (cpage) {
kunmap(cpage);
page_cache_release(cpage);
}
- return NULL;
+ return *cookie = paddr;
errout:
ext4_put_fname_crypto_ctx(&ctx);
if (cpage) {
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 7e3794e..d294793 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -296,21 +296,15 @@ fail:
return err;
}

-static void *f2fs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *f2fs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- struct page *page;
-
- page = page_follow_link_light(dentry, nd);
- if (IS_ERR(page))
- return page;
-
- /* this is broken symlink case */
- if (*nd_get_link(nd) == 0) {
- kunmap(page);
- page_cache_release(page);
- return ERR_PTR(-ENOENT);
+ const char *link = page_follow_link_light(dentry, cookie, nd);
+ if (!IS_ERR(link) && !*link) {
+ /* this is broken symlink case */
+ page_put_link(dentry, *cookie);
+ link = ERR_PTR(-ENOENT);
}
- return page;
+ return link;
}

static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0572bca..f9cb260 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1365,7 +1365,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
return err;
}

-static char *read_link(struct dentry *dentry)
+static const char *fuse_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1389,26 +1389,15 @@ static char *read_link(struct dentry *dentry)
link = ERR_PTR(ret);
} else {
link[ret] = '\0';
+ *cookie = link;
}
fuse_invalidate_atime(inode);
return link;
}

-static void free_link(char *link)
+static void fuse_put_link(struct dentry *dentry, void *cookie)
{
- if (!IS_ERR(link))
- free_page((unsigned long) link);
-}
-
-static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, read_link(dentry));
- return NULL;
-}
-
-static void fuse_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
-{
- free_link(nd_get_link(nd));
+ free_page((unsigned long) cookie);
}

static int fuse_dir_open(struct inode *inode, struct file *file)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 1b3ca7a..f59390a 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1548,7 +1548,7 @@ out:
* Returns: 0 on success or error code
*/

-static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *gfs2_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct gfs2_inode *ip = GFS2_I(d_inode(dentry));
struct gfs2_holder i_gh;
@@ -1561,8 +1561,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
error = gfs2_glock_nq(&i_gh);
if (error) {
gfs2_holder_uninit(&i_gh);
- nd_set_link(nd, ERR_PTR(error));
- return NULL;
+ return ERR_PTR(error);
}

size = (unsigned int)i_size_read(&ip->i_inode);
@@ -1586,8 +1585,9 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
brelse(dibh);
out:
gfs2_glock_dq_uninit(&i_gh);
- nd_set_link(nd, buf);
- return NULL;
+ if (!IS_ERR(buf))
+ *cookie = buf;
+ return buf;
}

/**
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index ef26317..f650ed6 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -892,7 +892,7 @@ static const struct inode_operations hostfs_dir_iops = {
.setattr = hostfs_setattr,
};

-static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *hostfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
char *link = __getname();
if (link) {
@@ -906,21 +906,18 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
}
if (err < 0) {
__putname(link);
- link = ERR_PTR(err);
+ return ERR_PTR(err);
}
} else {
- link = ERR_PTR(-ENOMEM);
+ return ERR_PTR(-ENOMEM);
}

- nd_set_link(nd, link);
- return NULL;
+ return *cookie = link;
}

-static void hostfs_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void hostfs_put_link(struct dentry *dentry, void *cookie)
{
- char *s = nd_get_link(nd);
- if (!IS_ERR(s))
- __putname(s);
+ __putname(cookie);
}

static const struct inode_operations hostfs_link_iops = {
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index fa2bd53..b8f24d3 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,20 +642,19 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
buflen);
}

-static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *hppfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

- return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, nd);
+ return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie, nd);
}

-static void hppfs_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+static void hppfs_put_link(struct dentry *dentry, void *cookie)
{
struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

if (d_inode(proc_dentry)->i_op->put_link)
- d_inode(proc_dentry)->i_op->put_link(proc_dentry, nd, cookie);
+ d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
}

static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8a19889..3c7e799 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,25 +112,23 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
return error;
}

-static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
- if (page) {
- error = kernfs_getlink(dentry, (char *) page);
- if (error < 0)
- free_page((unsigned long)page);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+ error = kernfs_getlink(dentry, (char *)page);
+ if (unlikely(error < 0)) {
+ free_page((unsigned long)page);
+ return ERR_PTR(error);
}
- nd_set_link(nd, error ? ERR_PTR(error) : (char *)page);
- return NULL;
+ return *cookie = (char *)page;
}

-static void kernfs_iop_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+static void kernfs_iop_put_link(struct dentry *dentry, void *cookie)
{
- char *page = nd_get_link(nd);
- if (!IS_ERR(page))
- free_page((unsigned long)page);
+ free_page((unsigned long)cookie);
}

const struct inode_operations kernfs_symlink_iops = {
diff --git a/fs/libfs.c b/fs/libfs.c
index 72e4e01..0c83fde 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,12 +1024,9 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
}
EXPORT_SYMBOL(noop_fsync);

-void kfree_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+void kfree_put_link(struct dentry *dentry, void *cookie)
{
- char *s = nd_get_link(nd);
- if (!IS_ERR(s))
- kfree(s);
+ kfree(cookie);
}
EXPORT_SYMBOL(kfree_put_link);

@@ -1094,10 +1091,9 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
}
EXPORT_SYMBOL(simple_nosetlease);

-void *simple_follow_link(struct dentry *dentry, struct nameidata *nd)
+const char *simple_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- nd_set_link(nd, d_inode(dentry)->i_link);
- return NULL;
+ return d_inode(dentry)->i_link;
}
EXPORT_SYMBOL(simple_follow_link);

diff --git a/fs/namei.c b/fs/namei.c
index b9cfdc1..862e659 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -502,7 +502,6 @@ struct nameidata {
int last_type;
unsigned depth;
struct file *base;
- char *saved_names[MAX_NESTED_LINKS + 1];
};

/*
@@ -713,23 +712,11 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
nd->flags |= LOOKUP_JUMPED;
}

-void nd_set_link(struct nameidata *nd, char *path)
-{
- nd->saved_names[nd->depth] = path;
-}
-EXPORT_SYMBOL(nd_set_link);
-
-char *nd_get_link(struct nameidata *nd)
-{
- return nd->saved_names[nd->depth];
-}
-EXPORT_SYMBOL(nd_get_link);
-
static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
{
struct inode *inode = link->dentry->d_inode;
- if (inode->i_op->put_link)
- inode->i_op->put_link(link->dentry, nd, cookie);
+ if (cookie && inode->i_op->put_link)
+ inode->i_op->put_link(link->dentry, cookie);
path_put(link);
}

@@ -854,7 +841,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
{
struct dentry *dentry = link->dentry;
int error;
- char *s;
+ const char *s;

BUG_ON(nd->flags & LOOKUP_RCU);

@@ -869,26 +856,20 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
current->total_link_count++;

touch_atime(link);
- nd_set_link(nd, NULL);

error = security_inode_follow_link(dentry);
if (error)
goto out_put_nd_path;

nd->last_type = LAST_BIND;
- *p = dentry->d_inode->i_op->follow_link(dentry, nd);
- error = PTR_ERR(*p);
- if (IS_ERR(*p))
+ *p = NULL;
+ s = dentry->d_inode->i_op->follow_link(dentry, p, nd);
+ error = PTR_ERR(s);
+ if (IS_ERR(s))
goto out_put_nd_path;

error = 0;
- s = nd_get_link(nd);
if (s) {
- if (unlikely(IS_ERR(s))) {
- path_put(&nd->path);
- put_link(nd, link, *p);
- return PTR_ERR(s);
- }
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -906,7 +887,6 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
return error;

out_put_nd_path:
- *p = NULL;
path_put(&nd->path);
path_put(link);
return error;
@@ -4423,18 +4403,15 @@ EXPORT_SYMBOL(readlink_copy);
*/
int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
{
- struct nameidata nd;
void *cookie;
+ const char *link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
int res;

- nd.depth = 0;
- cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
- if (IS_ERR(cookie))
- return PTR_ERR(cookie);
-
- res = readlink_copy(buffer, buflen, nd_get_link(&nd));
- if (dentry->d_inode->i_op->put_link)
- dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ res = readlink_copy(buffer, buflen, link);
+ if (cookie && dentry->d_inode->i_op->put_link)
+ dentry->d_inode->i_op->put_link(dentry, cookie);
return res;
}
EXPORT_SYMBOL(generic_readlink);
@@ -4466,22 +4443,21 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
}
EXPORT_SYMBOL(page_readlink);

-void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+const char *page_follow_link_light(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct page *page = NULL;
- nd_set_link(nd, page_getlink(dentry, &page));
- return page;
+ char *res = page_getlink(dentry, &page);
+ if (!IS_ERR(res))
+ *cookie = page;
+ return res;
}
EXPORT_SYMBOL(page_follow_link_light);

-void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+void page_put_link(struct dentry *dentry, void *cookie)
{
struct page *page = cookie;
-
- if (page) {
- kunmap(page);
- page_cache_release(page);
- }
+ kunmap(page);
+ page_cache_release(page);
}
EXPORT_SYMBOL(page_put_link);

diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 2d56200..c992b20 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -20,7 +20,6 @@
#include <linux/stat.h>
#include <linux/mm.h>
#include <linux/string.h>
-#include <linux/namei.h>

/* Symlink caching in the page cache is even more simplistic
* and straight-forward than readdir caching.
@@ -43,7 +42,7 @@ error:
return -EIO;
}

-static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *nfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct page *page;
@@ -51,19 +50,13 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)

err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
if (err)
- goto read_failed;
+ return err;
page = read_cache_page(&inode->i_data, 0,
(filler_t *)nfs_symlink_filler, inode);
- if (IS_ERR(page)) {
- err = page;
- goto read_failed;
- }
- nd_set_link(nd, kmap(page));
- return page;
-
-read_failed:
- nd_set_link(nd, err);
- return NULL;
+ if (IS_ERR(page))
+ return ERR_CAST(page);
+ *cookie = page;
+ return kmap(page);
}

/*
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1b4b9c5e..235ad42 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,12 +140,12 @@ struct ovl_link_data {
void *cookie;
};

-static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ovl_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- void *ret;
struct dentry *realdentry;
struct inode *realinode;
struct ovl_link_data *data = NULL;
+ const char *ret;

realdentry = ovl_dentry_real(dentry);
realinode = realdentry->d_inode;
@@ -160,19 +160,21 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
data->realdentry = realdentry;
}

- ret = realinode->i_op->follow_link(realdentry, nd);
- if (IS_ERR(ret)) {
+ ret = realinode->i_op->follow_link(realdentry, cookie, nd);
+ if (IS_ERR_OR_NULL(ret)) {
kfree(data);
return ret;
}

if (data)
- data->cookie = ret;
+ data->cookie = *cookie;

- return data;
+ *cookie = data;
+
+ return ret;
}

-static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void ovl_put_link(struct dentry *dentry, void *c)
{
struct inode *realinode;
struct ovl_link_data *data = c;
@@ -181,7 +183,7 @@ static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
return;

realinode = data->realdentry->d_inode;
- realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+ realinode->i_op->put_link(data->realdentry, data->cookie);
kfree(data);
}

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 093ca14..52652f8 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1380,7 +1380,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
return -ENOENT;
}

-static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct path path;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 8272aab..acd51d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -23,7 +23,6 @@
#include <linux/slab.h>
#include <linux/mount.h>
#include <linux/magic.h>
-#include <linux/namei.h>

#include <asm/uaccess.h>

@@ -394,16 +393,16 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
};
#endif

-static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct proc_dir_entry *pde = PDE(d_inode(dentry));
if (unlikely(!use_pde(pde)))
return ERR_PTR(-EINVAL);
- nd_set_link(nd, pde->data);
- return pde;
+ *cookie = pde;
+ return pde->data;
}

-static void proc_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+static void proc_put_link(struct dentry *dentry, void *p)
{
unuse_pde(p);
}
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index e512642..10d24dd 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
&mntns_operations,
};

-static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 6195b4a..ad33394 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -1,5 +1,4 @@
#include <linux/sched.h>
-#include <linux/namei.h>
#include <linux/slab.h>
#include <linux/pid_namespace.h>
#include "internal.h"
@@ -19,21 +18,20 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
- char *name = ERR_PTR(-ENOENT);
- if (tgid) {
- /* 11 for max length of signed int in decimal + NULL term */
- name = kmalloc(12, GFP_KERNEL);
- if (!name)
- name = ERR_PTR(-ENOMEM);
- else
- sprintf(name, "%d", tgid);
- }
- nd_set_link(nd, name);
- return NULL;
+ char *name;
+
+ if (!tgid)
+ return ERR_PTR(-ENOENT);
+ /* 11 for max length of signed int in decimal + NULL term */
+ name = kmalloc(12, GFP_KERNEL);
+ if (!name)
+ return ERR_PTR(-ENOMEM);
+ sprintf(name, "%d", tgid);
+ return *cookie = name;
}

static const struct inode_operations proc_self_inode_operations = {
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index a837199..85c96e0 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -1,5 +1,4 @@
#include <linux/sched.h>
-#include <linux/namei.h>
#include <linux/slab.h>
#include <linux/pid_namespace.h>
#include "internal.h"
@@ -20,21 +19,20 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_thread_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
pid_t pid = task_pid_nr_ns(current, ns);
- char *name = ERR_PTR(-ENOENT);
- if (pid) {
- name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
- if (!name)
- name = ERR_PTR(-ENOMEM);
- else
- sprintf(name, "%d/task/%d", tgid, pid);
- }
- nd_set_link(nd, name);
- return NULL;
+ char *name;
+
+ if (!pid)
+ return ERR_PTR(-ENOENT);
+ name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
+ if (!name)
+ return ERR_PTR(-ENOMEM);
+ sprintf(name, "%d/task/%d", tgid, pid);
+ return *cookie = name;
}

static const struct inode_operations proc_thread_self_inode_operations = {
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f4cd720..26c4dcb 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -41,7 +41,6 @@

#include <linux/capability.h>
#include <linux/xattr.h>
-#include <linux/namei.h>
#include <linux/posix_acl.h>
#include <linux/security.h>
#include <linux/fiemap.h>
@@ -414,9 +413,10 @@ xfs_vn_rename(
* we need to be very careful about how much stack we use.
* uio is kmalloced for this reason...
*/
-STATIC void *
+STATIC const char *
xfs_vn_follow_link(
struct dentry *dentry,
+ void **cookie,
struct nameidata *nd)
{
char *link;
@@ -430,14 +430,12 @@ xfs_vn_follow_link(
if (unlikely(error))
goto out_kfree;

- nd_set_link(nd, link);
- return NULL;
+ return *cookie = link;

out_kfree:
kfree(link);
out_err:
- nd_set_link(nd, ERR_PTR(error));
- return NULL;
+ return ERR_PTR(error);
}

STATIC int
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0ac758f..9ab9341 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1608,12 +1608,12 @@ struct file_operations {

struct inode_operations {
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
+ const char * (*follow_link) (struct dentry *, void **, struct nameidata *);
int (*permission) (struct inode *, int);
struct posix_acl * (*get_acl)(struct inode *, int);

int (*readlink) (struct dentry *, char __user *,int);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ void (*put_link) (struct dentry *, void *);

int (*create) (struct inode *,struct dentry *, umode_t, bool);
int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2705,13 +2705,13 @@ extern const struct file_operations generic_ro_fops;

extern int readlink_copy(char __user *, int, const char *);
extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *, struct nameidata *);
-extern void page_put_link(struct dentry *, struct nameidata *, void *);
+extern const char *page_follow_link_light(struct dentry *, void **, struct nameidata *);
+extern void page_put_link(struct dentry *, void *);
extern int __page_symlink(struct inode *inode, const char *symname, int len,
int nofs);
extern int page_symlink(struct inode *inode, const char *symname, int len);
extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, struct nameidata *, void *);
+extern void kfree_put_link(struct dentry *, void *);
extern int generic_readlink(struct dentry *, char __user *, int);
extern void generic_fillattr(struct inode *, struct kstat *);
int vfs_getattr_nosec(struct path *path, struct kstat *stat);
@@ -2722,7 +2722,7 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);
-void *simple_follow_link(struct dentry *, struct nameidata *);
+const char *simple_follow_link(struct dentry *, void **, struct nameidata *);
extern const struct inode_operations simple_symlink_inode_operations;

extern int iterate_dir(struct file *, struct dir_context *);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c899077..a5d5bed 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -71,8 +71,6 @@ extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);

extern void nd_jump_link(struct nameidata *nd, struct path *path);
-extern void nd_set_link(struct nameidata *nd, char *path);
-extern char *nd_get_link(struct nameidata *nd);

static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
diff --git a/mm/shmem.c b/mm/shmem.c
index 7f6e2f8..d1693dc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2475,24 +2475,23 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return 0;
}

-static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *shmem_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct page *page = NULL;
int error = shmem_getpage(d_inode(dentry), 0, &page, SGP_READ, NULL);
- nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
- if (page)
- unlock_page(page);
- return page;
+ if (error)
+ return ERR_PTR(error);
+ unlock_page(page);
+ *cookie = page;
+ return kmap(page);
}

-static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void shmem_put_link(struct dentry *dentry, void *cookie)
{
- if (!IS_ERR(nd_get_link(nd))) {
- struct page *page = cookie;
- kunmap(page);
- mark_page_accessed(page);
- page_cache_release(page);
- }
+ struct page *page = cookie;
+ kunmap(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
}

#ifdef CONFIG_TMPFS_XATTR
--
2.1.4

2015-05-05 05:41:50

by Al Viro

[permalink] [raw]
Subject: [PATCH 30/79] namei.c: separate the parts of follow_link() that find the link body

From: Al Viro <[email protected]>

Split a piece of fs/namei.c:follow_link() that does obtaining the link
body into a separate function. follow_link() itself is converted to
calling get_link() and then doing the body traversal (if any).

The next step will expand follow_link() call in link_path_walk()
and this helps to keep the size down...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 65 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 862e659..5f2c986 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -836,21 +836,22 @@ static int may_linkat(struct path *link)
return -EPERM;
}

-static __always_inline int
-follow_link(struct path *link, struct nameidata *nd, void **p)
+static __always_inline const char *
+get_link(struct path *link, struct nameidata *nd, void **p)
{
struct dentry *dentry = link->dentry;
+ struct inode *inode = dentry->d_inode;
int error;
- const char *s;
+ const char *res;

BUG_ON(nd->flags & LOOKUP_RCU);

if (link->mnt == nd->path.mnt)
mntget(link->mnt);

- error = -ELOOP;
+ res = ERR_PTR(-ELOOP);
if (unlikely(current->total_link_count >= 40))
- goto out_put_nd_path;
+ goto out;

cond_resched();
current->total_link_count++;
@@ -858,37 +859,43 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
touch_atime(link);

error = security_inode_follow_link(dentry);
+ res = ERR_PTR(error);
if (error)
- goto out_put_nd_path;
+ goto out;

nd->last_type = LAST_BIND;
*p = NULL;
- s = dentry->d_inode->i_op->follow_link(dentry, p, nd);
- error = PTR_ERR(s);
- if (IS_ERR(s))
- goto out_put_nd_path;
-
- error = 0;
- if (s) {
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- error = link_path_walk(s, nd);
- if (unlikely(error))
- put_link(nd, link, *p);
+ res = inode->i_op->follow_link(dentry, p, nd);
+ if (IS_ERR(res)) {
+out:
+ path_put(&nd->path);
+ path_put(link);
}
+ return res;
+}

- return error;
+static __always_inline int
+follow_link(struct path *link, struct nameidata *nd, void **p)
+{
+ const char *s = get_link(link, nd, p);
+ int error;

-out_put_nd_path:
- path_put(&nd->path);
- path_put(link);
+ if (unlikely(IS_ERR(s)))
+ return PTR_ERR(s);
+ if (unlikely(!s))
+ return 0;
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
+ }
+ nd->inode = nd->path.dentry->d_inode;
+ error = link_path_walk(s, nd);
+ if (unlikely(error))
+ put_link(nd, link, *p);
return error;
}

--
2.1.4

2015-05-05 05:27:12

by Al Viro

[permalink] [raw]
Subject: [PATCH 31/79] namei: don't bother with ->follow_link() if ->i_link is set

From: Al Viro <[email protected]>

with new calling conventions it's trivial

Signed-off-by: Al Viro <[email protected]>

Conflicts:
fs/namei.c
---
fs/namei.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5f2c986..dbcbe42 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -865,11 +865,14 @@ get_link(struct path *link, struct nameidata *nd, void **p)

nd->last_type = LAST_BIND;
*p = NULL;
- res = inode->i_op->follow_link(dentry, p, nd);
- if (IS_ERR(res)) {
+ res = inode->i_link;
+ if (!res) {
+ res = inode->i_op->follow_link(dentry, p, nd);
+ if (IS_ERR(res)) {
out:
- path_put(&nd->path);
- path_put(link);
+ path_put(&nd->path);
+ path_put(link);
+ }
}
return res;
}
@@ -4411,11 +4414,14 @@ EXPORT_SYMBOL(readlink_copy);
int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
{
void *cookie;
- const char *link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
+ const char *link = dentry->d_inode->i_link;
int res;

- if (IS_ERR(link))
- return PTR_ERR(link);
+ if (!link) {
+ link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
res = readlink_copy(buffer, buflen, link);
if (cookie && dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, cookie);
--
2.1.4

2015-05-05 05:41:42

by Al Viro

[permalink] [raw]
Subject: [PATCH 32/79] namei: introduce nameidata->link

From: Al Viro <[email protected]>

shares space with nameidata->next, walk_component() et.al. store
the struct path of symlink instead of returning it into a variable
passed by caller.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 62 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dbcbe42..72a6130 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -494,7 +494,10 @@ EXPORT_SYMBOL(path_put);

struct nameidata {
struct path path;
- struct qstr last;
+ union {
+ struct qstr last;
+ struct path link;
+ };
struct path root;
struct inode *inode; /* path.dentry.d_inode */
unsigned int flags;
@@ -1551,8 +1554,9 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
return unlikely(d_is_symlink(dentry)) ? follow : 0;
}

-static int walk_component(struct nameidata *nd, struct path *path, int follow)
+static int walk_component(struct nameidata *nd, int follow)
{
+ struct path path;
struct inode *inode;
int err;
/*
@@ -1562,38 +1566,39 @@ static int walk_component(struct nameidata *nd, struct path *path, int follow)
*/
if (unlikely(nd->last_type != LAST_NORM))
return handle_dots(nd, nd->last_type);
- err = lookup_fast(nd, path, &inode);
+ err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
goto out_err;

- err = lookup_slow(nd, path);
+ err = lookup_slow(nd, &path);
if (err < 0)
goto out_err;

- inode = path->dentry->d_inode;
+ inode = path.dentry->d_inode;
}
err = -ENOENT;
- if (d_is_negative(path->dentry))
+ if (d_is_negative(path.dentry))
goto out_path_put;

- if (should_follow_link(path->dentry, follow)) {
+ if (should_follow_link(path.dentry, follow)) {
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path->mnt ||
- unlazy_walk(nd, path->dentry))) {
+ if (unlikely(nd->path.mnt != path.mnt ||
+ unlazy_walk(nd, path.dentry))) {
err = -ECHILD;
goto out_err;
}
}
- BUG_ON(inode != path->dentry->d_inode);
+ BUG_ON(inode != path.dentry->d_inode);
+ nd->link = path;
return 1;
}
- path_to_nameidata(path, nd);
+ path_to_nameidata(&path, nd);
nd->inode = inode;
return 0;

out_path_put:
- path_to_nameidata(path, nd);
+ path_to_nameidata(&path, nd);
out_err:
terminate_walk(nd);
return err;
@@ -1606,12 +1611,12 @@ out_err:
* Without that kind of total limit, nasty chains of consecutive
* symlinks can cause almost arbitrarily long lookups.
*/
-static inline int nested_symlink(struct path *path, struct nameidata *nd)
+static inline int nested_symlink(struct nameidata *nd)
{
int res;

if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
- path_put_conditional(path, nd);
+ path_put_conditional(&nd->link, nd);
path_put(&nd->path);
return -ELOOP;
}
@@ -1621,13 +1626,13 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
current->link_count++;

do {
- struct path link = *path;
+ struct path link = nd->link;
void *cookie;

res = follow_link(&link, nd, &cookie);
if (res)
break;
- res = walk_component(nd, path, LOOKUP_FOLLOW);
+ res = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
} while (res > 0);

@@ -1762,7 +1767,6 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct path next;
int err;

while (*name=='/')
@@ -1822,17 +1826,17 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!*name)
return 0;

- err = walk_component(nd, &next, LOOKUP_FOLLOW);
+ err = walk_component(nd, LOOKUP_FOLLOW);
if (err < 0)
return err;

if (err) {
- err = nested_symlink(&next, nd);
+ err = nested_symlink(nd);
if (err)
return err;
}
if (!d_can_lookup(nd->path.dentry)) {
- err = -ENOTDIR;
+ err = -ENOTDIR;
break;
}
}
@@ -1952,20 +1956,19 @@ static void path_cleanup(struct nameidata *nd)
fput(nd->base);
}

-static inline int lookup_last(struct nameidata *nd, struct path *path)
+static inline int lookup_last(struct nameidata *nd)
{
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- return walk_component(nd, path, nd->flags & LOOKUP_FOLLOW);
+ return walk_component(nd, nd->flags & LOOKUP_FOLLOW);
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
static int path_lookupat(int dfd, const struct filename *name,
unsigned int flags, struct nameidata *nd)
{
- struct path path;
int err;

/*
@@ -1984,10 +1987,10 @@ static int path_lookupat(int dfd, const struct filename *name,
*/
err = path_init(dfd, name, flags, nd);
if (!err && !(flags & LOOKUP_PARENT)) {
- err = lookup_last(nd, &path);
+ err = lookup_last(nd);
while (err > 0) {
void *cookie;
- struct path link = path;
+ struct path link = nd->link;
err = may_follow_link(&link, nd);
if (unlikely(err))
break;
@@ -1995,7 +1998,7 @@ static int path_lookupat(int dfd, const struct filename *name,
err = follow_link(&link, nd, &cookie);
if (err)
break;
- err = lookup_last(nd, &path);
+ err = lookup_last(nd);
put_link(nd, &link, cookie);
}
}
@@ -2304,8 +2307,10 @@ done:
}
path->dentry = dentry;
path->mnt = nd->path.mnt;
- if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW))
+ if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) {
+ nd->link = *path;
return 1;
+ }
mntget(path->mnt);
follow_mount(path);
error = 0;
@@ -3034,6 +3039,7 @@ finish_lookup:
}
}
BUG_ON(inode != path->dentry->d_inode);
+ nd->link = *path;
return 1;
}

@@ -3222,7 +3228,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,

error = do_last(nd, &path, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
- struct path link = path;
+ struct path link = nd->link;
void *cookie;
error = may_follow_link(&link, nd);
if (unlikely(error))
--
2.1.4

2015-05-05 05:26:33

by Al Viro

[permalink] [raw]
Subject: [PATCH 33/79] do_last: move path there from caller's stack frame

From: Al Viro <[email protected]>

We used to need it to feed to follow_link(). No more...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 50 +++++++++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 72a6130..c5d5e9e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2903,7 +2903,7 @@ out_dput:
/*
* Handle the last step of open()
*/
-static int do_last(struct nameidata *nd, struct path *path,
+static int do_last(struct nameidata *nd,
struct file *file, const struct open_flags *op,
int *opened, struct filename *name)
{
@@ -2914,6 +2914,7 @@ static int do_last(struct nameidata *nd, struct path *path,
int acc_mode = op->acc_mode;
struct inode *inode;
struct path save_parent = { .dentry = NULL, .mnt = NULL };
+ struct path path;
bool retried = false;
int error;

@@ -2931,7 +2932,7 @@ static int do_last(struct nameidata *nd, struct path *path,
if (nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
/* we _can_ be in RCU mode here */
- error = lookup_fast(nd, path, &inode);
+ error = lookup_fast(nd, &path, &inode);
if (likely(!error))
goto finish_lookup;

@@ -2969,7 +2970,7 @@ retry_lookup:
*/
}
mutex_lock(&dir->d_inode->i_mutex);
- error = lookup_open(nd, path, file, op, got_write, opened);
+ error = lookup_open(nd, &path, file, op, got_write, opened);
mutex_unlock(&dir->d_inode->i_mutex);

if (error <= 0) {
@@ -2989,15 +2990,15 @@ retry_lookup:
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
- path_to_nameidata(path, nd);
+ path_to_nameidata(&path, nd);
goto finish_open_created;
}

/*
* create/update audit record if it already exists.
*/
- if (d_is_positive(path->dentry))
- audit_inode(name, path->dentry, 0);
+ if (d_is_positive(path.dentry))
+ audit_inode(name, path.dentry, 0);

/*
* If atomic_open() acquired write access it is dropped now due to
@@ -3013,7 +3014,7 @@ retry_lookup:
if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
goto exit_dput;

- error = follow_managed(path, nd->flags);
+ error = follow_managed(&path, nd->flags);
if (error < 0)
goto exit_dput;

@@ -3021,40 +3022,40 @@ retry_lookup:
nd->flags |= LOOKUP_JUMPED;

BUG_ON(nd->flags & LOOKUP_RCU);
- inode = path->dentry->d_inode;
+ inode = path.dentry->d_inode;
finish_lookup:
/* we _can_ be in RCU mode here */
error = -ENOENT;
- if (d_is_negative(path->dentry)) {
- path_to_nameidata(path, nd);
+ if (d_is_negative(path.dentry)) {
+ path_to_nameidata(&path, nd);
goto out;
}

- if (should_follow_link(path->dentry, nd->flags & LOOKUP_FOLLOW)) {
+ if (should_follow_link(path.dentry, nd->flags & LOOKUP_FOLLOW)) {
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path->mnt ||
- unlazy_walk(nd, path->dentry))) {
+ if (unlikely(nd->path.mnt != path.mnt ||
+ unlazy_walk(nd, path.dentry))) {
error = -ECHILD;
goto out;
}
}
- BUG_ON(inode != path->dentry->d_inode);
- nd->link = *path;
+ BUG_ON(inode != path.dentry->d_inode);
+ nd->link = path;
return 1;
}

- if (unlikely(d_is_symlink(path->dentry)) && !(open_flag & O_PATH)) {
- path_to_nameidata(path, nd);
+ if (unlikely(d_is_symlink(path.dentry)) && !(open_flag & O_PATH)) {
+ path_to_nameidata(&path, nd);
error = -ELOOP;
goto out;
}

- if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path->mnt) {
- path_to_nameidata(path, nd);
+ if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
+ path_to_nameidata(&path, nd);
} else {
save_parent.dentry = nd->path.dentry;
- save_parent.mnt = mntget(path->mnt);
- nd->path.dentry = path->dentry;
+ save_parent.mnt = mntget(path.mnt);
+ nd->path.dentry = path.dentry;

}
nd->inode = inode;
@@ -3116,7 +3117,7 @@ out:
return error;

exit_dput:
- path_put_conditional(path, nd);
+ path_put_conditional(&path, nd);
goto out;
exit_fput:
fput(file);
@@ -3207,7 +3208,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
struct nameidata *nd, const struct open_flags *op, int flags)
{
struct file *file;
- struct path path;
int opened = 0;
int error;

@@ -3226,7 +3226,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
goto out;

- error = do_last(nd, &path, file, op, &opened, pathname);
+ error = do_last(nd, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
struct path link = nd->link;
void *cookie;
@@ -3238,7 +3238,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
error = follow_link(&link, nd, &cookie);
if (unlikely(error))
break;
- error = do_last(nd, &path, file, op, &opened, pathname);
+ error = do_last(nd, file, op, &opened, pathname);
put_link(nd, &link, cookie);
}
out:
--
2.1.4

2015-05-05 05:26:22

by Al Viro

[permalink] [raw]
Subject: [PATCH 34/79] namei: expand nested_symlink() in its only caller

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 61 +++++++++++++++++++++++--------------------------------------
1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c5d5e9e..fae70a4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1605,43 +1605,6 @@ out_err:
}

/*
- * This limits recursive symlink follows to 8, while
- * limiting consecutive symlinks to 40.
- *
- * Without that kind of total limit, nasty chains of consecutive
- * symlinks can cause almost arbitrarily long lookups.
- */
-static inline int nested_symlink(struct nameidata *nd)
-{
- int res;
-
- if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
- path_put_conditional(&nd->link, nd);
- path_put(&nd->path);
- return -ELOOP;
- }
- BUG_ON(nd->depth >= MAX_NESTED_LINKS);
-
- nd->depth++;
- current->link_count++;
-
- do {
- struct path link = nd->link;
- void *cookie;
-
- res = follow_link(&link, nd, &cookie);
- if (res)
- break;
- res = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &link, cookie);
- } while (res > 0);
-
- current->link_count--;
- nd->depth--;
- return res;
-}
-
-/*
* We can do the critical dentry name comparison and hashing
* operations one word at a time, but we are limited to:
*
@@ -1831,7 +1794,29 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return err;

if (err) {
- err = nested_symlink(nd);
+ if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
+ path_put_conditional(&nd->link, nd);
+ path_put(&nd->path);
+ return -ELOOP;
+ }
+ BUG_ON(nd->depth >= MAX_NESTED_LINKS);
+
+ nd->depth++;
+ current->link_count++;
+
+ do {
+ struct path link = nd->link;
+ void *cookie;
+
+ err = follow_link(&link, nd, &cookie);
+ if (err)
+ break;
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd, &link, cookie);
+ } while (err > 0);
+
+ current->link_count--;
+ nd->depth--;
if (err)
return err;
}
--
2.1.4

2015-05-05 05:41:33

by Al Viro

[permalink] [raw]
Subject: [PATCH 35/79] namei: expand the call of follow_link() in link_path_walk()

From: Al Viro <[email protected]>

... and strip __always_inline from follow_link() - remaining callers
don't need that.

Now link_path_walk() recursion is a direct one.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fae70a4..2243d18 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -880,8 +880,7 @@ out:
return res;
}

-static __always_inline int
-follow_link(struct path *link, struct nameidata *nd, void **p)
+static int follow_link(struct path *link, struct nameidata *nd, void **p)
{
const char *s = get_link(link, nd, p);
int error;
@@ -1807,10 +1806,29 @@ static int link_path_walk(const char *name, struct nameidata *nd)
do {
struct path link = nd->link;
void *cookie;
+ const char *s = get_link(&link, nd, &cookie);

- err = follow_link(&link, nd, &cookie);
- if (err)
+ if (unlikely(IS_ERR(s))) {
+ err = PTR_ERR(s);
break;
+ }
+ err = 0;
+ if (likely(s)) {
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
+ }
+ nd->inode = nd->path.dentry->d_inode;
+ err = link_path_walk(s, nd);
+ if (unlikely(err)) {
+ put_link(nd, &link, cookie);
+ break;
+ }
+ }
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
} while (err > 0);
--
2.1.4

2015-05-05 05:27:33

by Al Viro

[permalink] [raw]
Subject: [PATCH 36/79] namei: move the calls of may_follow_link() into follow_link()

From: Al Viro <[email protected]>

All remaining callers of the former are preceded by the latter

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2243d18..a13e47d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -882,9 +882,12 @@ out:

static int follow_link(struct path *link, struct nameidata *nd, void **p)
{
- const char *s = get_link(link, nd, p);
- int error;
-
+ const char *s;
+ int error = may_follow_link(link, nd);
+ if (unlikely(error))
+ return error;
+ nd->flags |= LOOKUP_PARENT;
+ s = get_link(link, nd, p);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
if (unlikely(!s))
@@ -1994,10 +1997,6 @@ static int path_lookupat(int dfd, const struct filename *name,
while (err > 0) {
void *cookie;
struct path link = nd->link;
- err = may_follow_link(&link, nd);
- if (unlikely(err))
- break;
- nd->flags |= LOOKUP_PARENT;
err = follow_link(&link, nd, &cookie);
if (err)
break;
@@ -2344,10 +2343,6 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
while (err > 0) {
void *cookie;
struct path link = *path;
- err = may_follow_link(&link, nd);
- if (unlikely(err))
- break;
- nd->flags |= LOOKUP_PARENT;
err = follow_link(&link, nd, &cookie);
if (err)
break;
@@ -3233,10 +3228,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
while (unlikely(error > 0)) { /* trailing symlink */
struct path link = nd->link;
void *cookie;
- error = may_follow_link(&link, nd);
- if (unlikely(error))
- break;
- nd->flags |= LOOKUP_PARENT;
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
error = follow_link(&link, nd, &cookie);
if (unlikely(error))
--
2.1.4

2015-05-05 05:41:26

by Al Viro

[permalink] [raw]
Subject: [PATCH 37/79] namei: rename follow_link to trailing_symlink, move it down

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 62 ++++++++++++++++++++++++++++++--------------------------------
1 file changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a13e47d..7537ae0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -668,8 +668,6 @@ static __always_inline void set_root(struct nameidata *nd)
get_fs_root(current->fs, &nd->root);
}

-static int link_path_walk(const char *, struct nameidata *);
-
static __always_inline unsigned set_root_rcu(struct nameidata *nd)
{
struct fs_struct *fs = current->fs;
@@ -880,33 +878,6 @@ out:
return res;
}

-static int follow_link(struct path *link, struct nameidata *nd, void **p)
-{
- const char *s;
- int error = may_follow_link(link, nd);
- if (unlikely(error))
- return error;
- nd->flags |= LOOKUP_PARENT;
- s = get_link(link, nd, p);
- if (unlikely(IS_ERR(s)))
- return PTR_ERR(s);
- if (unlikely(!s))
- return 0;
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- error = link_path_walk(s, nd);
- if (unlikely(error))
- put_link(nd, link, *p);
- return error;
-}
-
static int follow_up_rcu(struct path *path)
{
struct mount *mnt = real_mount(path->mnt);
@@ -1962,6 +1933,33 @@ static void path_cleanup(struct nameidata *nd)
fput(nd->base);
}

+static int trailing_symlink(struct path *link, struct nameidata *nd, void **p)
+{
+ const char *s;
+ int error = may_follow_link(link, nd);
+ if (unlikely(error))
+ return error;
+ nd->flags |= LOOKUP_PARENT;
+ s = get_link(link, nd, p);
+ if (unlikely(IS_ERR(s)))
+ return PTR_ERR(s);
+ if (unlikely(!s))
+ return 0;
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
+ }
+ nd->inode = nd->path.dentry->d_inode;
+ error = link_path_walk(s, nd);
+ if (unlikely(error))
+ put_link(nd, link, *p);
+ return error;
+}
+
static inline int lookup_last(struct nameidata *nd)
{
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
@@ -1997,7 +1995,7 @@ static int path_lookupat(int dfd, const struct filename *name,
while (err > 0) {
void *cookie;
struct path link = nd->link;
- err = follow_link(&link, nd, &cookie);
+ err = trailing_symlink(&link, nd, &cookie);
if (err)
break;
err = lookup_last(nd);
@@ -2343,7 +2341,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
while (err > 0) {
void *cookie;
struct path link = *path;
- err = follow_link(&link, nd, &cookie);
+ err = trailing_symlink(&link, nd, &cookie);
if (err)
break;
err = mountpoint_last(nd, path);
@@ -3229,7 +3227,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
struct path link = nd->link;
void *cookie;
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- error = follow_link(&link, nd, &cookie);
+ error = trailing_symlink(&link, nd, &cookie);
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
--
2.1.4

2015-05-05 05:24:43

by Al Viro

[permalink] [raw]
Subject: [PATCH 38/79] link_path_walk: handle get_link() returning ERR_PTR() immediately

From: Al Viro <[email protected]>

If we get ERR_PTR() from get_link(), we are guaranteed to get err != 0
when we break out of do-while, so we are going to hit if (err) return err;
shortly after it. Pull that into the if (IS_ERR(s)) body.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7537ae0..ea528ca 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1784,7 +1784,9 @@ static int link_path_walk(const char *name, struct nameidata *nd)

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- break;
+ current->link_count--;
+ nd->depth--;
+ return err;
}
err = 0;
if (likely(s)) {
--
2.1.4

2015-05-05 05:41:18

by Al Viro

[permalink] [raw]
Subject: [PATCH 39/79] link_path_walk: don't bother with walk_component() after jumping link

From: Al Viro <[email protected]>

... it does nothing if nd->last_type is LAST_BIND.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ea528ca..d6235ea 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1789,7 +1789,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return err;
}
err = 0;
- if (likely(s)) {
+ if (unlikely(!s)) {
+ /* jumped */
+ put_link(nd, &link, cookie);
+ break;
+ } else {
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -1804,9 +1808,9 @@ static int link_path_walk(const char *name, struct nameidata *nd)
put_link(nd, &link, cookie);
break;
}
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd, &link, cookie);
}
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &link, cookie);
} while (err > 0);

current->link_count--;
--
2.1.4

2015-05-05 05:24:54

by Al Viro

[permalink] [raw]
Subject: [PATCH 40/79] link_path_walk: turn inner loop into explicit goto

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 61 ++++++++++++++++++++++++++++++++-----------------------------
1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d6235ea..d46df22 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1767,6 +1767,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return err;

if (err) {
+ struct path link;
+ void *cookie;
+ const char *s;
+
if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
path_put_conditional(&nd->link, nd);
path_put(&nd->path);
@@ -1777,41 +1781,40 @@ static int link_path_walk(const char *name, struct nameidata *nd)
nd->depth++;
current->link_count++;

- do {
- struct path link = nd->link;
- void *cookie;
- const char *s = get_link(&link, nd, &cookie);
-
- if (unlikely(IS_ERR(s))) {
- err = PTR_ERR(s);
- current->link_count--;
- nd->depth--;
- return err;
+l:
+ link = nd->link;
+ s = get_link(&link, nd, &cookie);
+
+ if (unlikely(IS_ERR(s))) {
+ err = PTR_ERR(s);
+ current->link_count--;
+ nd->depth--;
+ return err;
+ }
+ err = 0;
+ if (unlikely(!s)) {
+ /* jumped */
+ put_link(nd, &link, cookie);
+ } else {
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
}
- err = 0;
- if (unlikely(!s)) {
- /* jumped */
+ nd->inode = nd->path.dentry->d_inode;
+ err = link_path_walk(s, nd);
+ if (unlikely(err)) {
put_link(nd, &link, cookie);
- break;
} else {
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- err = link_path_walk(s, nd);
- if (unlikely(err)) {
- put_link(nd, &link, cookie);
- break;
- }
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
+ if (err > 0)
+ goto l;
}
- } while (err > 0);
+ }

current->link_count--;
nd->depth--;
--
2.1.4

2015-05-05 05:36:53

by Al Viro

[permalink] [raw]
Subject: [PATCH 41/79] link_path_walk: massage a bit more

From: Al Viro <[email protected]>

Pull the block after the if-else in the end of what used to be do-while
body into all branches there. We are almost done with the massage...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d46df22..d11165c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1795,6 +1795,8 @@ l:
if (unlikely(!s)) {
/* jumped */
put_link(nd, &link, cookie);
+ current->link_count--;
+ nd->depth--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1808,18 +1810,23 @@ l:
err = link_path_walk(s, nd);
if (unlikely(err)) {
put_link(nd, &link, cookie);
+ current->link_count--;
+ nd->depth--;
+ return err;
} else {
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
- if (err > 0)
+ current->link_count--;
+ nd->depth--;
+ if (err < 0)
+ return err;
+ if (err > 0) {
+ current->link_count++;
+ nd->depth++;
goto l;
+ }
}
}
-
- current->link_count--;
- nd->depth--;
- if (err)
- return err;
}
if (!d_can_lookup(nd->path.dentry)) {
err = -ENOTDIR;
--
2.1.4

2015-05-05 05:28:54

by Al Viro

[permalink] [raw]
Subject: [PATCH 42/79] link_path_walk: get rid of duplication

From: Al Viro <[email protected]>

What we do after the second walk_component() + put_link() + depth
decrement in there is exactly equivalent to what's done right
after the first walk_component(). Easy to verify and not at all
surprising, seeing that there we have just walked the last
component of nested symlink.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11165c..b740f6f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1763,6 +1763,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return 0;

err = walk_component(nd, LOOKUP_FOLLOW);
+Walked:
if (err < 0)
return err;

@@ -1781,7 +1782,6 @@ static int link_path_walk(const char *name, struct nameidata *nd)
nd->depth++;
current->link_count++;

-l:
link = nd->link;
s = get_link(&link, nd, &cookie);

@@ -1818,13 +1818,7 @@ l:
put_link(nd, &link, cookie);
current->link_count--;
nd->depth--;
- if (err < 0)
- return err;
- if (err > 0) {
- current->link_count++;
- nd->depth++;
- goto l;
- }
+ goto Walked;
}
}
}
--
2.1.4

2015-05-05 05:25:13

by Al Viro

[permalink] [raw]
Subject: [PATCH 43/79] link_path_walk: final preparations to killing recursion

From: Al Viro <[email protected]>

reduce the number of returns in there - turn all places
where it returns zero into goto OK and places where it
returns non-zero into goto Err. The only non-trivial
detail is that all breaks in the loop are guaranteed
to be with non-zero err.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b740f6f..ac73d9e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1708,7 +1708,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
while (*name=='/')
name++;
if (!*name)
- return 0;
+ goto OK;

/* At this point we know we have a real path component. */
for(;;) {
@@ -1751,7 +1751,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)

name += hashlen_len(hash_len);
if (!*name)
- return 0;
+ goto OK;
/*
* If it wasn't NUL, we know it was '/'. Skip that
* slash, and continue until no more slashes.
@@ -1760,12 +1760,12 @@ static int link_path_walk(const char *name, struct nameidata *nd)
name++;
} while (unlikely(*name == '/'));
if (!*name)
- return 0;
+ goto OK;

err = walk_component(nd, LOOKUP_FOLLOW);
Walked:
if (err < 0)
- return err;
+ goto Err;

if (err) {
struct path link;
@@ -1775,7 +1775,8 @@ Walked:
if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
path_put_conditional(&nd->link, nd);
path_put(&nd->path);
- return -ELOOP;
+ err = -ELOOP;
+ goto Err;
}
BUG_ON(nd->depth >= MAX_NESTED_LINKS);

@@ -1789,7 +1790,7 @@ Walked:
err = PTR_ERR(s);
current->link_count--;
nd->depth--;
- return err;
+ goto Err;
}
err = 0;
if (unlikely(!s)) {
@@ -1812,7 +1813,7 @@ Walked:
put_link(nd, &link, cookie);
current->link_count--;
nd->depth--;
- return err;
+ goto Err;
} else {
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
@@ -1828,7 +1829,10 @@ Walked:
}
}
terminate_walk(nd);
+Err:
return err;
+OK:
+ return 0;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-05 05:41:10

by Al Viro

[permalink] [raw]
Subject: [PATCH 44/79] link_path_walk: kill the recursion

From: Al Viro <[email protected]>

absolutely straightforward now - the only variables we need to preserve
across the recursive call are name, link and cookie, and recursion depth
is limited (and can is equal to nd->depth). So arrange an array of
triples to hold instances of those and be done with that.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 40 +++++++++++++++++++++++++++++-----------
1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ac73d9e..3c622ad 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1703,8 +1703,14 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
+ struct saved {
+ struct path link;
+ void *cookie;
+ const char *name;
+ } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
int err;
-
+
+start:
while (*name=='/')
name++;
if (!*name)
@@ -1768,8 +1774,6 @@ Walked:
goto Err;

if (err) {
- struct path link;
- void *cookie;
const char *s;

if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
@@ -1782,22 +1786,25 @@ Walked:

nd->depth++;
current->link_count++;
+ last++;

- link = nd->link;
- s = get_link(&link, nd, &cookie);
+ last->link = nd->link;
+ s = get_link(&last->link, nd, &last->cookie);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
current->link_count--;
nd->depth--;
+ last--;
goto Err;
}
err = 0;
if (unlikely(!s)) {
/* jumped */
- put_link(nd, &link, cookie);
+ put_link(nd, &last->link, last->cookie);
current->link_count--;
nd->depth--;
+ last--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1808,17 +1815,24 @@ Walked:
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
- err = link_path_walk(s, nd);
+ last->name = name;
+ name = s;
+ goto start;
+
+back:
+ name = last->name;
if (unlikely(err)) {
- put_link(nd, &link, cookie);
+ put_link(nd, &last->link, last->cookie);
current->link_count--;
nd->depth--;
+ last--;
goto Err;
} else {
err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &link, cookie);
+ put_link(nd, &last->link, last->cookie);
current->link_count--;
nd->depth--;
+ last--;
goto Walked;
}
}
@@ -1830,9 +1844,13 @@ Walked:
}
terminate_walk(nd);
Err:
- return err;
+ if (likely(!nd->depth))
+ return err;
+ goto back;
OK:
- return 0;
+ if (likely(!nd->depth))
+ return 0;
+ goto back;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-05 05:40:58

by Al Viro

[permalink] [raw]
Subject: [PATCH 45/79] link_path_walk: split "return from recursive call" path

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 40 +++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3c622ad..723dead 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1818,23 +1818,6 @@ Walked:
last->name = name;
name = s;
goto start;
-
-back:
- name = last->name;
- if (unlikely(err)) {
- put_link(nd, &last->link, last->cookie);
- current->link_count--;
- nd->depth--;
- last--;
- goto Err;
- } else {
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &last->link, last->cookie);
- current->link_count--;
- nd->depth--;
- last--;
- goto Walked;
- }
}
}
if (!d_can_lookup(nd->path.dentry)) {
@@ -1844,13 +1827,24 @@ back:
}
terminate_walk(nd);
Err:
- if (likely(!nd->depth))
- return err;
- goto back;
+ while (unlikely(nd->depth)) {
+ put_link(nd, &last->link, last->cookie);
+ current->link_count--;
+ nd->depth--;
+ last--;
+ }
+ return err;
OK:
- if (likely(!nd->depth))
- return 0;
- goto back;
+ if (unlikely(nd->depth)) {
+ name = last->name;
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd, &last->link, last->cookie);
+ current->link_count--;
+ nd->depth--;
+ last--;
+ goto Walked;
+ }
+ return 0;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-05 05:40:35

by Al Viro

[permalink] [raw]
Subject: [PATCH 46/79] link_path_walk: cleanup - turn goto start; into continue;

From: Al Viro <[email protected]>

Deal with skipping leading slashes before what used to be the
recursive call. That way we can get rid of that goto completely.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 723dead..38ff968 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1710,11 +1710,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
} stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
int err;

-start:
while (*name=='/')
name++;
if (!*name)
- goto OK;
+ return 0;

/* At this point we know we have a real path component. */
for(;;) {
@@ -1813,11 +1812,15 @@ Walked:
nd->path = nd->root;
path_get(&nd->root);
nd->flags |= LOOKUP_JUMPED;
+ while (unlikely(*++s == '/'))
+ ;
}
nd->inode = nd->path.dentry->d_inode;
last->name = name;
+ if (!*s)
+ goto OK;
name = s;
- goto start;
+ continue;
}
}
if (!d_can_lookup(nd->path.dentry)) {
--
2.1.4

2015-05-05 05:24:33

by Al Viro

[permalink] [raw]
Subject: [PATCH 47/79] namei: move link/cookie pairs into nameidata

From: Al Viro <[email protected]>

Array of MAX_NESTED_LINKS + 1 elements put into nameidata;
what used to be a local array in link_path_walk() occupies
entries 1 .. MAX_NESTED_LINKS in it, link and cookie from
the trailing symlink handling loops - entry 0.

This is _not_ the final arrangement; just an easily verified
incremental step.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 38ff968..e6d54ec 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,11 @@ struct nameidata {
int last_type;
unsigned depth;
struct file *base;
+ struct saved {
+ struct path link;
+ void *cookie;
+ const char *name;
+ } stack[MAX_NESTED_LINKS + 1];
};

/*
@@ -1703,11 +1708,7 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct saved {
- struct path link;
- void *cookie;
- const char *name;
- } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
+ struct saved *last = nd->stack;
int err;

while (*name=='/')
@@ -2022,13 +2023,13 @@ static int path_lookupat(int dfd, const struct filename *name,
if (!err && !(flags & LOOKUP_PARENT)) {
err = lookup_last(nd);
while (err > 0) {
- void *cookie;
- struct path link = nd->link;
- err = trailing_symlink(&link, nd, &cookie);
+ nd->stack[0].link = nd->link;
+ err = trailing_symlink(&nd->stack[0].link,
+ nd, &nd->stack[0].cookie);
if (err)
break;
err = lookup_last(nd);
- put_link(nd, &link, cookie);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
}
}

@@ -2368,13 +2369,13 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,

err = mountpoint_last(nd, path);
while (err > 0) {
- void *cookie;
- struct path link = *path;
- err = trailing_symlink(&link, nd, &cookie);
+ nd->stack[0].link = nd->link;
+ err = trailing_symlink(&nd->stack[0].link,
+ nd, &nd->stack[0].cookie);
if (err)
break;
err = mountpoint_last(nd, path);
- put_link(nd, &link, cookie);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
}
out:
path_cleanup(nd);
@@ -3253,14 +3254,14 @@ static struct file *path_openat(int dfd, struct filename *pathname,

error = do_last(nd, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
- struct path link = nd->link;
- void *cookie;
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- error = trailing_symlink(&link, nd, &cookie);
+ nd->stack[0].link = nd->link;
+ error= trailing_symlink(&nd->stack[0].link,
+ nd, &nd->stack[0].cookie);
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
- put_link(nd, &link, cookie);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
}
out:
path_cleanup(nd);
--
2.1.4

2015-05-05 05:40:46

by Al Viro

[permalink] [raw]
Subject: [PATCH 48/79] namei: trim redundant arguments of trailing_symlink()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e6d54ec..22256ef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1963,14 +1963,15 @@ static void path_cleanup(struct nameidata *nd)
fput(nd->base);
}

-static int trailing_symlink(struct path *link, struct nameidata *nd, void **p)
+static int trailing_symlink(struct nameidata *nd)
{
const char *s;
- int error = may_follow_link(link, nd);
+ int error = may_follow_link(&nd->link, nd);
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
- s = get_link(link, nd, p);
+ nd->stack[0].link = nd->link;
+ s = get_link(&nd->stack[0].link, nd, &nd->stack[0].cookie);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
if (unlikely(!s))
@@ -1986,7 +1987,7 @@ static int trailing_symlink(struct path *link, struct nameidata *nd, void **p)
nd->inode = nd->path.dentry->d_inode;
error = link_path_walk(s, nd);
if (unlikely(error))
- put_link(nd, link, *p);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
return error;
}

@@ -2023,9 +2024,7 @@ static int path_lookupat(int dfd, const struct filename *name,
if (!err && !(flags & LOOKUP_PARENT)) {
err = lookup_last(nd);
while (err > 0) {
- nd->stack[0].link = nd->link;
- err = trailing_symlink(&nd->stack[0].link,
- nd, &nd->stack[0].cookie);
+ err = trailing_symlink(nd);
if (err)
break;
err = lookup_last(nd);
@@ -2369,9 +2368,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,

err = mountpoint_last(nd, path);
while (err > 0) {
- nd->stack[0].link = nd->link;
- err = trailing_symlink(&nd->stack[0].link,
- nd, &nd->stack[0].cookie);
+ err = trailing_symlink(nd);
if (err)
break;
err = mountpoint_last(nd, path);
@@ -3255,9 +3252,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
error = do_last(nd, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- nd->stack[0].link = nd->link;
- error= trailing_symlink(&nd->stack[0].link,
- nd, &nd->stack[0].cookie);
+ error = trailing_symlink(nd);
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
--
2.1.4

2015-05-05 05:36:46

by Al Viro

[permalink] [raw]
Subject: [PATCH 49/79] namei: trim redundant arguments of fs/namei.c:put_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 22256ef..f0066785 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -718,12 +718,13 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
nd->flags |= LOOKUP_JUMPED;
}

-static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
+static inline void put_link(struct nameidata *nd)
{
- struct inode *inode = link->dentry->d_inode;
- if (cookie && inode->i_op->put_link)
- inode->i_op->put_link(link->dentry, cookie);
- path_put(link);
+ struct saved *last = nd->stack + nd->depth;
+ struct inode *inode = last->link.dentry->d_inode;
+ if (last->cookie && inode->i_op->put_link)
+ inode->i_op->put_link(last->link.dentry, last->cookie);
+ path_put(&last->link);
}

int sysctl_protected_symlinks __read_mostly = 0;
@@ -1801,7 +1802,7 @@ Walked:
err = 0;
if (unlikely(!s)) {
/* jumped */
- put_link(nd, &last->link, last->cookie);
+ put_link(nd);
current->link_count--;
nd->depth--;
last--;
@@ -1832,7 +1833,7 @@ Walked:
terminate_walk(nd);
Err:
while (unlikely(nd->depth)) {
- put_link(nd, &last->link, last->cookie);
+ put_link(nd);
current->link_count--;
nd->depth--;
last--;
@@ -1842,7 +1843,7 @@ OK:
if (unlikely(nd->depth)) {
name = last->name;
err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &last->link, last->cookie);
+ put_link(nd);
current->link_count--;
nd->depth--;
last--;
@@ -1987,7 +1988,7 @@ static int trailing_symlink(struct nameidata *nd)
nd->inode = nd->path.dentry->d_inode;
error = link_path_walk(s, nd);
if (unlikely(error))
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
return error;
}

@@ -2028,7 +2029,7 @@ static int path_lookupat(int dfd, const struct filename *name,
if (err)
break;
err = lookup_last(nd);
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
}
}

@@ -2372,7 +2373,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (err)
break;
err = mountpoint_last(nd, path);
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
}
out:
path_cleanup(nd);
@@ -3256,7 +3257,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
}
out:
path_cleanup(nd);
--
2.1.4

2015-05-05 05:28:39

by Al Viro

[permalink] [raw]
Subject: [PATCH 50/79] namei: trim the arguments of get_link()

From: Al Viro <[email protected]>

same story as the previous commit
---
fs/namei.c | 45 +++++++++++++++++++++------------------------
1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f0066785..8fd8ccb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -843,27 +843,33 @@ static int may_linkat(struct path *link)
return -EPERM;
}

-static __always_inline const char *
-get_link(struct path *link, struct nameidata *nd, void **p)
+static __always_inline
+const char *get_link(struct nameidata *nd)
{
- struct dentry *dentry = link->dentry;
+ struct saved *last = nd->stack + nd->depth;
+ struct dentry *dentry = nd->link.dentry;
struct inode *inode = dentry->d_inode;
int error;
const char *res;

BUG_ON(nd->flags & LOOKUP_RCU);

- if (link->mnt == nd->path.mnt)
- mntget(link->mnt);
+ if (nd->link.mnt == nd->path.mnt)
+ mntget(nd->link.mnt);

- res = ERR_PTR(-ELOOP);
- if (unlikely(current->total_link_count >= 40))
- goto out;
+ if (unlikely(current->total_link_count >= 40)) {
+ path_put(&nd->path);
+ path_put(&nd->link);
+ return ERR_PTR(-ELOOP);
+ }
+
+ last->link = nd->link;
+ last->cookie = NULL;

cond_resched();
current->total_link_count++;

- touch_atime(link);
+ touch_atime(&last->link);

error = security_inode_follow_link(dentry);
res = ERR_PTR(error);
@@ -871,14 +877,13 @@ get_link(struct path *link, struct nameidata *nd, void **p)
goto out;

nd->last_type = LAST_BIND;
- *p = NULL;
res = inode->i_link;
if (!res) {
- res = inode->i_op->follow_link(dentry, p, nd);
+ res = inode->i_op->follow_link(dentry, &last->cookie, nd);
if (IS_ERR(res)) {
out:
path_put(&nd->path);
- path_put(link);
+ path_put(&last->link);
}
}
return res;
@@ -1709,7 +1714,6 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct saved *last = nd->stack;
int err;

while (*name=='/')
@@ -1787,16 +1791,13 @@ Walked:

nd->depth++;
current->link_count++;
- last++;

- last->link = nd->link;
- s = get_link(&last->link, nd, &last->cookie);
+ s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
current->link_count--;
nd->depth--;
- last--;
goto Err;
}
err = 0;
@@ -1805,7 +1806,6 @@ Walked:
put_link(nd);
current->link_count--;
nd->depth--;
- last--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1818,7 +1818,7 @@ Walked:
;
}
nd->inode = nd->path.dentry->d_inode;
- last->name = name;
+ nd->stack[nd->depth].name = name;
if (!*s)
goto OK;
name = s;
@@ -1836,17 +1836,15 @@ Err:
put_link(nd);
current->link_count--;
nd->depth--;
- last--;
}
return err;
OK:
if (unlikely(nd->depth)) {
- name = last->name;
+ name = nd->stack[nd->depth].name;
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd);
current->link_count--;
nd->depth--;
- last--;
goto Walked;
}
return 0;
@@ -1971,8 +1969,7 @@ static int trailing_symlink(struct nameidata *nd)
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
- nd->stack[0].link = nd->link;
- s = get_link(&nd->stack[0].link, nd, &nd->stack[0].cookie);
+ s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
if (unlikely(!s))
--
2.1.4

2015-05-05 05:28:47

by Al Viro

[permalink] [raw]
Subject: [PATCH 51/79] namei: remove restrictions on nesting depth

From: Al Viro <[email protected]>

The only restriction is that on the total amount of symlinks
crossed; how they are nested does not matter

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 66 ++++++++++++++++++++++++++++++++++++++++-----------
include/linux/namei.h | 2 ++
2 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8fd8ccb..078db1d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,7 @@ void path_put(const struct path *path)
}
EXPORT_SYMBOL(path_put);

+#define EMBEDDED_LEVELS 2
struct nameidata {
struct path path;
union {
@@ -509,9 +510,42 @@ struct nameidata {
struct path link;
void *cookie;
const char *name;
- } stack[MAX_NESTED_LINKS + 1];
+ } *stack, internal[EMBEDDED_LEVELS];
};

+static void set_nameidata(struct nameidata *nd)
+{
+ nd->stack = nd->internal;
+}
+
+static void restore_nameidata(struct nameidata *nd)
+{
+ if (nd->stack != nd->internal) {
+ kfree(nd->stack);
+ nd->stack = nd->internal;
+ }
+}
+
+static int __nd_alloc_stack(struct nameidata *nd)
+{
+ struct saved *p = kmalloc((MAXSYMLINKS + 1) * sizeof(struct saved),
+ GFP_KERNEL);
+ if (unlikely(!p))
+ return -ENOMEM;
+ memcpy(p, nd->internal, sizeof(nd->internal));
+ nd->stack = p;
+ return 0;
+}
+
+static inline int nd_alloc_stack(struct nameidata *nd)
+{
+ if (likely(nd->depth != EMBEDDED_LEVELS - 1))
+ return 0;
+ if (likely(nd->stack != nd->internal))
+ return 0;
+ return __nd_alloc_stack(nd);
+}
+
/*
* Path walking has 2 modes, rcu-walk and ref-walk (see
* Documentation/filesystems/path-lookup.txt). In situations when we can't
@@ -857,7 +891,7 @@ const char *get_link(struct nameidata *nd)
if (nd->link.mnt == nd->path.mnt)
mntget(nd->link.mnt);

- if (unlikely(current->total_link_count >= 40)) {
+ if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
path_put(&nd->path);
path_put(&nd->link);
return ERR_PTR(-ELOOP);
@@ -1781,22 +1815,18 @@ Walked:
if (err) {
const char *s;

- if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
- path_put_conditional(&nd->link, nd);
- path_put(&nd->path);
- err = -ELOOP;
- goto Err;
+ err = nd_alloc_stack(nd);
+ if (unlikely(err)) {
+ path_to_nameidata(&nd->link, nd);
+ break;
}
- BUG_ON(nd->depth >= MAX_NESTED_LINKS);

nd->depth++;
- current->link_count++;

s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- current->link_count--;
nd->depth--;
goto Err;
}
@@ -1804,7 +1834,6 @@ Walked:
if (unlikely(!s)) {
/* jumped */
put_link(nd);
- current->link_count--;
nd->depth--;
} else {
if (*s == '/') {
@@ -1834,7 +1863,6 @@ Walked:
Err:
while (unlikely(nd->depth)) {
put_link(nd);
- current->link_count--;
nd->depth--;
}
return err;
@@ -1843,7 +1871,6 @@ OK:
name = nd->stack[nd->depth].name;
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd);
- current->link_count--;
nd->depth--;
goto Walked;
}
@@ -2047,7 +2074,11 @@ static int path_lookupat(int dfd, const struct filename *name,
static int filename_lookup(int dfd, struct filename *name,
unsigned int flags, struct nameidata *nd)
{
- int retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
+ int retval;
+
+ set_nameidata(nd);
+ retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
+
if (unlikely(retval == -ECHILD))
retval = path_lookupat(dfd, name, flags, nd);
if (unlikely(retval == -ESTALE))
@@ -2055,6 +2086,7 @@ static int filename_lookup(int dfd, struct filename *name,

if (likely(!retval))
audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
+ restore_nameidata(nd);
return retval;
}

@@ -2385,6 +2417,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
int error;
if (IS_ERR(name))
return PTR_ERR(name);
+ set_nameidata(&nd);
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
if (unlikely(error == -ECHILD))
error = path_mountpoint(dfd, name, path, &nd, flags);
@@ -2392,6 +2425,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
if (likely(!error))
audit_inode(name, path->dentry, 0);
+ restore_nameidata(&nd);
putname(name);
return error;
}
@@ -3281,11 +3315,13 @@ struct file *do_filp_open(int dfd, struct filename *pathname,
int flags = op->lookup_flags;
struct file *filp;

+ set_nameidata(&nd);
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
filp = path_openat(dfd, pathname, &nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_REVAL);
+ restore_nameidata(&nd);
return filp;
}

@@ -3299,6 +3335,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,

nd.root.mnt = mnt;
nd.root.dentry = dentry;
+ set_nameidata(&nd);

if (d_is_symlink(dentry) && op->intent & LOOKUP_OPEN)
return ERR_PTR(-ELOOP);
@@ -3312,6 +3349,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
file = path_openat(-1, filename, &nd, op, flags);
if (unlikely(file == ERR_PTR(-ESTALE)))
file = path_openat(-1, filename, &nd, op, flags | LOOKUP_REVAL);
+ restore_nameidata(&nd);
putname(filename);
return file;
}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a5d5bed..3a6cc96 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -11,6 +11,8 @@ struct nameidata;

enum { MAX_NESTED_LINKS = 8 };

+#define MAXSYMLINKS 40
+
/*
* Type of the last component on LOOKUP_PARENT
*/
--
2.1.4

2015-05-05 05:28:32

by Al Viro

[permalink] [raw]
Subject: [PATCH 52/79] link_path_walk: nd->depth massage, part 1

From: Al Viro <[email protected]>

nd->stack[0] is unused until the handling of trailing symlinks and
we want to get rid of that. Having fucked that transformation up
several times, I went for bloody pedantic series of provably equivalent
transformations. Sorry.

Step 1: keep nd->depth higher by one in link_path_walk() - increment upon
entry, decrement on exits, adjust the arithmetics inside and surround the
calls of functions that care about nd->depth value (nd_alloc_stack(),
get_link(), put_link()) with decrement/increment pairs.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 078db1d..da23835 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1755,6 +1755,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!*name)
return 0;

+ nd->depth++;
/* At this point we know we have a real path component. */
for(;;) {
u64 hash_len;
@@ -1815,15 +1816,18 @@ Walked:
if (err) {
const char *s;

+ nd->depth--;
err = nd_alloc_stack(nd);
+ nd->depth++;
if (unlikely(err)) {
path_to_nameidata(&nd->link, nd);
break;
}

nd->depth++;
-
+ nd->depth--;
s = get_link(nd);
+ nd->depth++;

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
@@ -1833,7 +1837,9 @@ Walked:
err = 0;
if (unlikely(!s)) {
/* jumped */
+ nd->depth--;
put_link(nd);
+ nd->depth++;
nd->depth--;
} else {
if (*s == '/') {
@@ -1847,7 +1853,7 @@ Walked:
;
}
nd->inode = nd->path.dentry->d_inode;
- nd->stack[nd->depth].name = name;
+ nd->stack[nd->depth - 1].name = name;
if (!*s)
goto OK;
name = s;
@@ -1861,19 +1867,25 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth)) {
+ while (unlikely(nd->depth > 1)) {
+ nd->depth--;
put_link(nd);
+ nd->depth++;
nd->depth--;
}
+ nd->depth--;
return err;
OK:
- if (unlikely(nd->depth)) {
- name = nd->stack[nd->depth].name;
+ if (unlikely(nd->depth > 1)) {
+ name = nd->stack[nd->depth - 1].name;
err = walk_component(nd, LOOKUP_FOLLOW);
+ nd->depth--;
put_link(nd);
+ nd->depth++;
nd->depth--;
goto Walked;
}
+ nd->depth--;
return 0;
}

--
2.1.4

2015-05-05 05:37:03

by Al Viro

[permalink] [raw]
Subject: [PATCH 53/79] link_path_walk: nd->depth massage, part 2

From: Al Viro <[email protected]>

collapse adjacent increment/decrement pairs.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index da23835..d8ad004 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1824,8 +1824,6 @@ Walked:
break;
}

- nd->depth++;
- nd->depth--;
s = get_link(nd);
nd->depth++;

@@ -1839,8 +1837,6 @@ Walked:
/* jumped */
nd->depth--;
put_link(nd);
- nd->depth++;
- nd->depth--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1870,8 +1866,6 @@ Err:
while (unlikely(nd->depth > 1)) {
nd->depth--;
put_link(nd);
- nd->depth++;
- nd->depth--;
}
nd->depth--;
return err;
@@ -1881,8 +1875,6 @@ OK:
err = walk_component(nd, LOOKUP_FOLLOW);
nd->depth--;
put_link(nd);
- nd->depth++;
- nd->depth--;
goto Walked;
}
nd->depth--;
--
2.1.4

2015-05-05 05:36:35

by Al Viro

[permalink] [raw]
Subject: [PATCH 54/79] link_path_walk: nd->depth massage, part 3

From: Al Viro <[email protected]>

remove decrement/increment surrounding nd_alloc_stack(), adjust the
test in it.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d8ad004..0d1fcaf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -539,7 +539,7 @@ static int __nd_alloc_stack(struct nameidata *nd)

static inline int nd_alloc_stack(struct nameidata *nd)
{
- if (likely(nd->depth != EMBEDDED_LEVELS - 1))
+ if (likely(nd->depth != EMBEDDED_LEVELS))
return 0;
if (likely(nd->stack != nd->internal))
return 0;
@@ -1816,9 +1816,7 @@ Walked:
if (err) {
const char *s;

- nd->depth--;
err = nd_alloc_stack(nd);
- nd->depth++;
if (unlikely(err)) {
path_to_nameidata(&nd->link, nd);
break;
--
2.1.4

2015-05-05 05:25:06

by Al Viro

[permalink] [raw]
Subject: [PATCH 55/79] link_path_walk: nd->depth massage, part 4

From: Al Viro <[email protected]>

lift increment/decrement into link_path_walk() callers.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0d1fcaf..800cf7e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1755,7 +1755,6 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!*name)
return 0;

- nd->depth++;
/* At this point we know we have a real path component. */
for(;;) {
u64 hash_len;
@@ -1865,7 +1864,6 @@ Err:
nd->depth--;
put_link(nd);
}
- nd->depth--;
return err;
OK:
if (unlikely(nd->depth > 1)) {
@@ -1875,7 +1873,6 @@ OK:
put_link(nd);
goto Walked;
}
- nd->depth--;
return 0;
}

@@ -1978,7 +1975,10 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
return -ECHILD;
done:
current->total_link_count = 0;
- return link_path_walk(s, nd);
+ nd->depth++;
+ retval = link_path_walk(s, nd);
+ nd->depth--;
+ return retval;
}

static void path_cleanup(struct nameidata *nd)
@@ -2012,7 +2012,9 @@ static int trailing_symlink(struct nameidata *nd)
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
+ nd->depth++;
error = link_path_walk(s, nd);
+ nd->depth--;
if (unlikely(error))
put_link(nd);
return error;
--
2.1.4

2015-05-05 05:29:09

by Al Viro

[permalink] [raw]
Subject: [PATCH 56/79] trailing_symlink: nd->depth massage, part 5

From: Al Viro <[email protected]>

move increment of ->depth to the point where we'd discovered
that get_link() has not returned an error, adjust exits
accordingly.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 800cf7e..51bcbec 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2001,8 +2001,11 @@ static int trailing_symlink(struct nameidata *nd)
s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
- if (unlikely(!s))
+ nd->depth++;
+ if (unlikely(!s)) {
+ nd->depth--;
return 0;
+ }
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -2012,12 +2015,14 @@ static int trailing_symlink(struct nameidata *nd)
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
- nd->depth++;
error = link_path_walk(s, nd);
- nd->depth--;
- if (unlikely(error))
+ if (unlikely(error)) {
+ nd->depth--;
put_link(nd);
- return error;
+ return error;
+ }
+ nd->depth--;
+ return 0;
}

static inline int lookup_last(struct nameidata *nd)
--
2.1.4

2015-05-05 05:37:11

by Al Viro

[permalink] [raw]
Subject: [PATCH 57/79] get_link: nd->depth massage, part 6

From: Al Viro <[email protected]>

make get_link() increment nd->depth on successful exit

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 51bcbec..f81a029 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -918,8 +918,10 @@ const char *get_link(struct nameidata *nd)
out:
path_put(&nd->path);
path_put(&last->link);
+ return res;
}
}
+ nd->depth++;
return res;
}

@@ -1822,11 +1824,9 @@ Walked:
}

s = get_link(nd);
- nd->depth++;

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- nd->depth--;
goto Err;
}
err = 0;
@@ -2001,7 +2001,6 @@ static int trailing_symlink(struct nameidata *nd)
s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
- nd->depth++;
if (unlikely(!s)) {
nd->depth--;
return 0;
--
2.1.4

2015-05-05 05:36:39

by Al Viro

[permalink] [raw]
Subject: [PATCH 58/79] trailing_symlink: nd->depth massage, part 7

From: Al Viro <[email protected]>

move decrement of nd->depth on successful returns into the callers.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f81a029..b7b1c2b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2001,10 +2001,8 @@ static int trailing_symlink(struct nameidata *nd)
s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
- if (unlikely(!s)) {
- nd->depth--;
+ if (unlikely(!s))
return 0;
- }
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -2020,7 +2018,6 @@ static int trailing_symlink(struct nameidata *nd)
put_link(nd);
return error;
}
- nd->depth--;
return 0;
}

@@ -2061,6 +2058,7 @@ static int path_lookupat(int dfd, const struct filename *name,
if (err)
break;
err = lookup_last(nd);
+ nd->depth--;
put_link(nd);
}
}
@@ -2410,6 +2408,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (err)
break;
err = mountpoint_last(nd, path);
+ nd->depth--;
put_link(nd);
}
out:
@@ -3296,6 +3295,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
+ nd->depth--;
put_link(nd);
}
out:
--
2.1.4

2015-05-05 05:36:21

by Al Viro

[permalink] [raw]
Subject: [PATCH 59/79] put_link: nd->depth massage, part 8

From: Al Viro <[email protected]>

all calls are preceded by decrement of nd->depth; move it into
put_link() itself.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b7b1c2b..0713961 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -754,7 +754,7 @@ void nd_jump_link(struct nameidata *nd, struct path *path)

static inline void put_link(struct nameidata *nd)
{
- struct saved *last = nd->stack + nd->depth;
+ struct saved *last = nd->stack + --nd->depth;
struct inode *inode = last->link.dentry->d_inode;
if (last->cookie && inode->i_op->put_link)
inode->i_op->put_link(last->link.dentry, last->cookie);
@@ -1832,7 +1832,6 @@ Walked:
err = 0;
if (unlikely(!s)) {
/* jumped */
- nd->depth--;
put_link(nd);
} else {
if (*s == '/') {
@@ -1860,16 +1859,13 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth > 1)) {
- nd->depth--;
+ while (unlikely(nd->depth > 1))
put_link(nd);
- }
return err;
OK:
if (unlikely(nd->depth > 1)) {
name = nd->stack[nd->depth - 1].name;
err = walk_component(nd, LOOKUP_FOLLOW);
- nd->depth--;
put_link(nd);
goto Walked;
}
@@ -2013,12 +2009,9 @@ static int trailing_symlink(struct nameidata *nd)
}
nd->inode = nd->path.dentry->d_inode;
error = link_path_walk(s, nd);
- if (unlikely(error)) {
- nd->depth--;
+ if (unlikely(error))
put_link(nd);
- return error;
- }
- return 0;
+ return error;
}

static inline int lookup_last(struct nameidata *nd)
@@ -2058,7 +2051,6 @@ static int path_lookupat(int dfd, const struct filename *name,
if (err)
break;
err = lookup_last(nd);
- nd->depth--;
put_link(nd);
}
}
@@ -2408,7 +2400,6 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (err)
break;
err = mountpoint_last(nd, path);
- nd->depth--;
put_link(nd);
}
out:
@@ -3295,7 +3286,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
- nd->depth--;
put_link(nd);
}
out:
--
2.1.4

2015-05-05 05:36:16

by Al Viro

[permalink] [raw]
Subject: [PATCH 60/79] link_path_walk: nd->depth massage, part 9

From: Al Viro <[email protected]>

Make link_path_walk() work with any value of nd->depth on entry -
memorize it and use it in tests instead of comparing with 1.
Don't bother with increment/decrement in path_init().

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0713961..10cb0be 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1751,6 +1751,7 @@ static inline u64 hash_name(const char *name)
static int link_path_walk(const char *name, struct nameidata *nd)
{
int err;
+ int orig_depth = nd->depth;

while (*name=='/')
name++;
@@ -1859,11 +1860,11 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth > 1))
+ while (unlikely(nd->depth > orig_depth))
put_link(nd);
return err;
OK:
- if (unlikely(nd->depth > 1)) {
+ if (unlikely(nd->depth > orig_depth)) {
name = nd->stack[nd->depth - 1].name;
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd);
@@ -1971,10 +1972,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
return -ECHILD;
done:
current->total_link_count = 0;
- nd->depth++;
- retval = link_path_walk(s, nd);
- nd->depth--;
- return retval;
+ return link_path_walk(s, nd);
}

static void path_cleanup(struct nameidata *nd)
--
2.1.4

2015-05-05 05:36:07

by Al Viro

[permalink] [raw]
Subject: [PATCH 61/79] link_path_walk: nd->depth massage, part 10

From: Al Viro <[email protected]>

Get rid of orig_depth checks in OK: logics. If nd->depth is
zero, we had been called from path_init() and we are done.
If it is greater than 1, we are not done, whether we'd been
called from path_init() or trailing_symlink(). And in
case when it's 1, we might have been called from path_init()
and reached the end of nested symlink (in which case
nd->stack[0].name will point to the rest of pathname and
we are not done) or from trailing_symlink(), in which case
we are done.

Just have trailing_symlink() leave NULL in nd->stack[0].name
and use that to discriminate between those cases.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 10cb0be..f8371e5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1864,13 +1864,15 @@ Err:
put_link(nd);
return err;
OK:
- if (unlikely(nd->depth > orig_depth)) {
- name = nd->stack[nd->depth - 1].name;
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd);
- goto Walked;
- }
- return 0;
+ if (!nd->depth) /* called from path_init(), done */
+ return 0;
+ name = nd->stack[nd->depth - 1].name;
+ if (!name) /* called from trailing_symlink(), done */
+ return 0;
+
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd);
+ goto Walked;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
@@ -2006,6 +2008,7 @@ static int trailing_symlink(struct nameidata *nd)
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
+ nd->stack[0].name = NULL;
error = link_path_walk(s, nd);
if (unlikely(error))
put_link(nd);
--
2.1.4

2015-05-05 05:36:30

by Al Viro

[permalink] [raw]
Subject: [PATCH 62/79] link_path_walk: end of nd->depth massage

From: Al Viro <[email protected]>

get rid of orig_depth - we only use it on error exit to tell whether
to stop doing put_link() when depth reaches 0 (call from path_init())
or when it reaches 1 (call from trailing_symlink()). However, in
the latter case the caller would immediately follow with one more
put_link(). Just keep doing it until the depth reaches zero (and
simplify trailing_symlink() as the result).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f8371e5..85646cc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1751,7 +1751,6 @@ static inline u64 hash_name(const char *name)
static int link_path_walk(const char *name, struct nameidata *nd)
{
int err;
- int orig_depth = nd->depth;

while (*name=='/')
name++;
@@ -1860,7 +1859,7 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth > orig_depth))
+ while (unlikely(nd->depth))
put_link(nd);
return err;
OK:
@@ -2009,10 +2008,7 @@ static int trailing_symlink(struct nameidata *nd)
}
nd->inode = nd->path.dentry->d_inode;
nd->stack[0].name = NULL;
- error = link_path_walk(s, nd);
- if (unlikely(error))
- put_link(nd);
- return error;
+ return link_path_walk(s, nd);
}

static inline int lookup_last(struct nameidata *nd)
--
2.1.4

2015-05-05 05:35:40

by Al Viro

[permalink] [raw]
Subject: [PATCH 63/79] namei: we never need more than MAXSYMLINKS entries in nd->stack

From: Al Viro <[email protected]>

The only reason why we needed one more was that purely nested
MAXSYMLINKS symlinks could lead to path_init() using that many
entries in addition to nd->stack[0] which it left unused.

That can't happen now - path_init() starts with entry 0 (and
trailing_symlink() is called only when we'd already encountered
one symlink, so no more than MAXSYMLINKS-1 are left).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 85646cc..e4ca3b1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -528,7 +528,7 @@ static void restore_nameidata(struct nameidata *nd)

static int __nd_alloc_stack(struct nameidata *nd)
{
- struct saved *p = kmalloc((MAXSYMLINKS + 1) * sizeof(struct saved),
+ struct saved *p = kmalloc(MAXSYMLINKS * sizeof(struct saved),
GFP_KERNEL);
if (unlikely(!p))
return -ENOMEM;
--
2.1.4

2015-05-05 05:35:31

by Al Viro

[permalink] [raw]
Subject: [PATCH 64/79] namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers

From: Al Viro <[email protected]>

follow_dotdot_rcu() does an equivalent of terminate_walk() on failure;
shifting it into callers makes for simpler rules and those callers
already have terminate_walk() on other failure exits.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e4ca3b1..e53f146 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1230,10 +1230,6 @@ static int follow_dotdot_rcu(struct nameidata *nd)
return 0;

failed:
- nd->flags &= ~LOOKUP_RCU;
- if (!(nd->flags & LOOKUP_ROOT))
- nd->root.mnt = NULL;
- rcu_read_unlock();
return -ECHILD;
}

@@ -1543,8 +1539,7 @@ static inline int handle_dots(struct nameidata *nd, int type)
{
if (type == LAST_DOTDOT) {
if (nd->flags & LOOKUP_RCU) {
- if (follow_dotdot_rcu(nd))
- return -ECHILD;
+ return follow_dotdot_rcu(nd);
} else
follow_dotdot(nd);
}
@@ -1584,8 +1579,12 @@ static int walk_component(struct nameidata *nd, int follow)
* to be able to know about the current root directory and
* parent relationships.
*/
- if (unlikely(nd->last_type != LAST_NORM))
- return handle_dots(nd, nd->last_type);
+ if (unlikely(nd->last_type != LAST_NORM)) {
+ err = handle_dots(nd, nd->last_type);
+ if (err)
+ goto out_err;
+ return 0;
+ }
err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
@@ -2973,8 +2972,10 @@ static int do_last(struct nameidata *nd,

if (nd->last_type != LAST_NORM) {
error = handle_dots(nd, nd->last_type);
- if (error)
+ if (unlikely(error)) {
+ terminate_walk(nd);
return error;
+ }
goto finish_open;
}

--
2.1.4

2015-05-05 05:36:02

by Al Viro

[permalink] [raw]
Subject: [PATCH 65/79] lift terminate_walk() into callers of walk_component()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e53f146..76bf620 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1579,20 +1579,16 @@ static int walk_component(struct nameidata *nd, int follow)
* to be able to know about the current root directory and
* parent relationships.
*/
- if (unlikely(nd->last_type != LAST_NORM)) {
- err = handle_dots(nd, nd->last_type);
- if (err)
- goto out_err;
- return 0;
- }
+ if (unlikely(nd->last_type != LAST_NORM))
+ return handle_dots(nd, nd->last_type);
err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
- goto out_err;
+ return err;

err = lookup_slow(nd, &path);
if (err < 0)
- goto out_err;
+ return err;

inode = path.dentry->d_inode;
}
@@ -1604,8 +1600,7 @@ static int walk_component(struct nameidata *nd, int follow)
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path.mnt ||
unlazy_walk(nd, path.dentry))) {
- err = -ECHILD;
- goto out_err;
+ return -ECHILD;
}
}
BUG_ON(inode != path.dentry->d_inode);
@@ -1618,8 +1613,6 @@ static int walk_component(struct nameidata *nd, int follow)

out_path_put:
path_to_nameidata(&path, nd);
-out_err:
- terminate_walk(nd);
return err;
}

@@ -1811,7 +1804,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
err = walk_component(nd, LOOKUP_FOLLOW);
Walked:
if (err < 0)
- goto Err;
+ break;

if (err) {
const char *s;
@@ -2012,11 +2005,15 @@ static int trailing_symlink(struct nameidata *nd)

static inline int lookup_last(struct nameidata *nd)
{
+ int err;
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- return walk_component(nd, nd->flags & LOOKUP_FOLLOW);
+ err = walk_component(nd, nd->flags & LOOKUP_FOLLOW);
+ if (err < 0)
+ terminate_walk(nd);
+ return err;
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
--
2.1.4

2015-05-05 05:34:09

by Al Viro

[permalink] [raw]
Subject: [PATCH 66/79] namei: lift (open-coded) terminate_walk() into callers of get_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 76bf620..f7bce07 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -892,7 +892,6 @@ const char *get_link(struct nameidata *nd)
mntget(nd->link.mnt);

if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
- path_put(&nd->path);
path_put(&nd->link);
return ERR_PTR(-ELOOP);
}
@@ -916,7 +915,6 @@ const char *get_link(struct nameidata *nd)
res = inode->i_op->follow_link(dentry, &last->cookie, nd);
if (IS_ERR(res)) {
out:
- path_put(&nd->path);
path_put(&last->link);
return res;
}
@@ -1819,7 +1817,7 @@ Walked:

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- goto Err;
+ break;
}
err = 0;
if (unlikely(!s)) {
@@ -1850,7 +1848,6 @@ Walked:
}
}
terminate_walk(nd);
-Err:
while (unlikely(nd->depth))
put_link(nd);
return err;
@@ -1986,8 +1983,10 @@ static int trailing_symlink(struct nameidata *nd)
return error;
nd->flags |= LOOKUP_PARENT;
s = get_link(nd);
- if (unlikely(IS_ERR(s)))
+ if (unlikely(IS_ERR(s))) {
+ terminate_walk(nd);
return PTR_ERR(s);
+ }
if (unlikely(!s))
return 0;
if (*s == '/') {
--
2.1.4

2015-05-05 05:35:09

by Al Viro

[permalink] [raw]
Subject: [PATCH 67/79] namei: take put_link() into {lookup,mountpoint,do}_last()

From: Al Viro <[email protected]>

rationale: we'll need to have terminate_walk() do put_link() on
everything, which will mean that in some cases ..._last() will do
put_link() anyway. Easier to have them do it in all cases.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f7bce07..0fbc9b7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2010,6 +2010,8 @@ static inline int lookup_last(struct nameidata *nd)

nd->flags &= ~LOOKUP_PARENT;
err = walk_component(nd, nd->flags & LOOKUP_FOLLOW);
+ if (nd->depth)
+ put_link(nd);
if (err < 0)
terminate_walk(nd);
return err;
@@ -2037,13 +2039,10 @@ static int path_lookupat(int dfd, const struct filename *name,
*/
err = path_init(dfd, name, flags, nd);
if (!err && !(flags & LOOKUP_PARENT)) {
- err = lookup_last(nd);
- while (err > 0) {
+ while ((err = lookup_last(nd)) > 0) {
err = trailing_symlink(nd);
if (err)
break;
- err = lookup_last(nd);
- put_link(nd);
}
}

@@ -2354,6 +2353,8 @@ done:
dput(dentry);
goto out;
}
+ if (nd->depth)
+ put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) {
@@ -2365,6 +2366,8 @@ done:
error = 0;
out:
terminate_walk(nd);
+ if (nd->depth)
+ put_link(nd);
return error;
}

@@ -2386,13 +2389,10 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (unlikely(err))
goto out;

- err = mountpoint_last(nd, path);
- while (err > 0) {
+ while ((err = mountpoint_last(nd, path)) > 0) {
err = trailing_symlink(nd);
if (err)
break;
- err = mountpoint_last(nd, path);
- put_link(nd);
}
out:
path_cleanup(nd);
@@ -2970,6 +2970,8 @@ static int do_last(struct nameidata *nd,
error = handle_dots(nd, nd->last_type);
if (unlikely(error)) {
terminate_walk(nd);
+ if (nd->depth)
+ put_link(nd);
return error;
}
goto finish_open;
@@ -2995,8 +2997,11 @@ static int do_last(struct nameidata *nd,
* about to look up
*/
error = complete_walk(nd);
- if (error)
+ if (error) {
+ if (nd->depth)
+ put_link(nd);
return error;
+ }

audit_inode(name, dir, LOOKUP_PARENT);
error = -EISDIR;
@@ -3087,6 +3092,8 @@ finish_lookup:
}
}
BUG_ON(inode != path.dentry->d_inode);
+ if (nd->depth)
+ put_link(nd);
nd->link = path;
return 1;
}
@@ -3110,6 +3117,8 @@ finish_lookup:
finish_open:
error = complete_walk(nd);
if (error) {
+ if (nd->depth)
+ put_link(nd);
path_put(&save_parent);
return error;
}
@@ -3161,6 +3170,8 @@ out:
mnt_drop_write(nd->path.mnt);
path_put(&save_parent);
terminate_walk(nd);
+ if (nd->depth)
+ put_link(nd);
return error;

exit_dput:
@@ -3273,14 +3284,11 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
goto out;

- error = do_last(nd, file, op, &opened, pathname);
- while (unlikely(error > 0)) { /* trailing symlink */
+ while ((error = do_last(nd, file, op, &opened, pathname)) > 0) {
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
error = trailing_symlink(nd);
if (unlikely(error))
break;
- error = do_last(nd, file, op, &opened, pathname);
- put_link(nd);
}
out:
path_cleanup(nd);
--
2.1.4

2015-05-05 05:35:16

by Al Viro

[permalink] [raw]
Subject: [PATCH 68/79] namei: have terminate_walk() do put_link() on everything left

From: Al Viro <[email protected]>

All callers of terminate_walk() are followed by more or less
open-coded eqiuvalent of "do put_link() on everything left
in nd->stack". Better done in terminate_walk() itself, and
when we go for RCU symlink traversal we'll have to do it
there anyway.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0fbc9b7..3faf3e2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1554,6 +1554,8 @@ static void terminate_walk(struct nameidata *nd)
nd->root.mnt = NULL;
rcu_read_unlock();
}
+ while (unlikely(nd->depth))
+ put_link(nd);
}

/*
@@ -1848,8 +1850,6 @@ Walked:
}
}
terminate_walk(nd);
- while (unlikely(nd->depth))
- put_link(nd);
return err;
OK:
if (!nd->depth) /* called from path_init(), done */
@@ -2366,8 +2366,6 @@ done:
error = 0;
out:
terminate_walk(nd);
- if (nd->depth)
- put_link(nd);
return error;
}

@@ -2970,8 +2968,6 @@ static int do_last(struct nameidata *nd,
error = handle_dots(nd, nd->last_type);
if (unlikely(error)) {
terminate_walk(nd);
- if (nd->depth)
- put_link(nd);
return error;
}
goto finish_open;
@@ -3170,8 +3166,6 @@ out:
mnt_drop_write(nd->path.mnt);
path_put(&save_parent);
terminate_walk(nd);
- if (nd->depth)
- put_link(nd);
return error;

exit_dput:
--
2.1.4

2015-05-05 05:34:53

by Al Viro

[permalink] [raw]
Subject: [PATCH 69/79] link_path_walk: move the OK: inside the loop

From: Al Viro <[email protected]>

fewer labels that way; in particular, resuming after the end of
nested symlink is straight-line.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3faf3e2..49ed4b5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1798,11 +1798,21 @@ static int link_path_walk(const char *name, struct nameidata *nd)
do {
name++;
} while (unlikely(*name == '/'));
- if (!*name)
- goto OK;
-
- err = walk_component(nd, LOOKUP_FOLLOW);
-Walked:
+ if (unlikely(!*name)) {
+OK:
+ /* called from path_init(), done */
+ if (!nd->depth)
+ return 0;
+ name = nd->stack[nd->depth - 1].name;
+ /* called from trailing_symlink(), done */
+ if (!name)
+ return 0;
+ /* last component of nested symlink */
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd);
+ } else {
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ }
if (err < 0)
break;

@@ -1851,16 +1861,6 @@ Walked:
}
terminate_walk(nd);
return err;
-OK:
- if (!nd->depth) /* called from path_init(), done */
- return 0;
- name = nd->stack[nd->depth - 1].name;
- if (!name) /* called from trailing_symlink(), done */
- return 0;
-
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd);
- goto Walked;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-05 05:34:01

by Al Viro

[permalink] [raw]
Subject: [PATCH 70/79] namei: new calling conventions for walk_component()

From: Al Viro <[email protected]>

instead of a single flag (!= 0 => we want to follow symlinks) pass
two bits - WALK_GET (want to follow symlinks) and WALK_PUT (put_link()
once we are done looking at the name). The latter matters only for
success exits - on failure the caller will discard everything anyway.

Suggestions for better variant are welcome; what this thing aims for
is making sure that pending put_link() is done *before* walk_component()
decides to pick a symlink up, rather than between picking it up and
acting upon it. See the next commit for payoff.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 49ed4b5..5eb8b0c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1569,7 +1569,9 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
return unlikely(d_is_symlink(dentry)) ? follow : 0;
}

-static int walk_component(struct nameidata *nd, int follow)
+enum {WALK_GET = 1, WALK_PUT = 2};
+
+static int walk_component(struct nameidata *nd, int flags)
{
struct path path;
struct inode *inode;
@@ -1579,8 +1581,12 @@ static int walk_component(struct nameidata *nd, int follow)
* to be able to know about the current root directory and
* parent relationships.
*/
- if (unlikely(nd->last_type != LAST_NORM))
- return handle_dots(nd, nd->last_type);
+ if (unlikely(nd->last_type != LAST_NORM)) {
+ err = handle_dots(nd, nd->last_type);
+ if (flags & WALK_PUT)
+ put_link(nd);
+ return err;
+ }
err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
@@ -1596,7 +1602,9 @@ static int walk_component(struct nameidata *nd, int follow)
if (d_is_negative(path.dentry))
goto out_path_put;

- if (should_follow_link(path.dentry, follow)) {
+ if (flags & WALK_PUT)
+ put_link(nd);
+ if (should_follow_link(path.dentry, flags & WALK_GET)) {
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path.mnt ||
unlazy_walk(nd, path.dentry))) {
@@ -1808,10 +1816,9 @@ OK:
if (!name)
return 0;
/* last component of nested symlink */
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd);
+ err = walk_component(nd, WALK_GET | WALK_PUT);
} else {
- err = walk_component(nd, LOOKUP_FOLLOW);
+ err = walk_component(nd, WALK_GET);
}
if (err < 0)
break;
@@ -2009,9 +2016,12 @@ static inline int lookup_last(struct nameidata *nd)
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- err = walk_component(nd, nd->flags & LOOKUP_FOLLOW);
- if (nd->depth)
- put_link(nd);
+ err = walk_component(nd,
+ nd->flags & LOOKUP_FOLLOW
+ ? nd->depth
+ ? WALK_PUT | WALK_GET
+ : WALK_GET
+ : 0);
if (err < 0)
terminate_walk(nd);
return err;
--
2.1.4

2015-05-05 05:33:16

by Al Viro

[permalink] [raw]
Subject: [PATCH 71/79] namei: make should_follow_link() store the link in nd->link

From: Al Viro <[email protected]>

... if it decides to follow, that is.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 63 +++++++++++++++++++++++++++++++++-----------------------------
1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5eb8b0c..026d7df 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1558,15 +1558,31 @@ static void terminate_walk(struct nameidata *nd)
put_link(nd);
}

+static int pick_link(struct nameidata *nd, struct path *link)
+{
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(nd->path.mnt != link->mnt ||
+ unlazy_walk(nd, link->dentry))) {
+ return -ECHILD;
+ }
+ }
+ nd->link = *link;
+ return 1;
+}
+
/*
* Do we need to follow links? We _really_ want to be able
* to do this check without having to look at inode->i_op,
* so we keep a cache of "no, this doesn't need follow_link"
* for the common case.
*/
-static inline int should_follow_link(struct dentry *dentry, int follow)
+static inline int should_follow_link(struct nameidata *nd, struct path *link, int follow)
{
- return unlikely(d_is_symlink(dentry)) ? follow : 0;
+ if (likely(!d_is_symlink(link->dentry)))
+ return 0;
+ if (!follow)
+ return 0;
+ return pick_link(nd, link);
}

enum {WALK_GET = 1, WALK_PUT = 2};
@@ -1604,17 +1620,9 @@ static int walk_component(struct nameidata *nd, int flags)

if (flags & WALK_PUT)
put_link(nd);
- if (should_follow_link(path.dentry, flags & WALK_GET)) {
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path.mnt ||
- unlazy_walk(nd, path.dentry))) {
- return -ECHILD;
- }
- }
- BUG_ON(inode != path.dentry->d_inode);
- nd->link = path;
- return 1;
- }
+ err = should_follow_link(nd, &path, flags & WALK_GET);
+ if (unlikely(err))
+ return err;
path_to_nameidata(&path, nd);
nd->inode = inode;
return 0;
@@ -2367,9 +2375,11 @@ done:
put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
- if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) {
- nd->link = *path;
- return 1;
+ error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW);
+ if (unlikely(error)) {
+ if (error < 0)
+ goto out;
+ return error;
}
mntget(path->mnt);
follow_mount(path);
@@ -3089,19 +3099,14 @@ finish_lookup:
goto out;
}

- if (should_follow_link(path.dentry, nd->flags & LOOKUP_FOLLOW)) {
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path.mnt ||
- unlazy_walk(nd, path.dentry))) {
- error = -ECHILD;
- goto out;
- }
- }
- BUG_ON(inode != path.dentry->d_inode);
- if (nd->depth)
- put_link(nd);
- nd->link = path;
- return 1;
+ if (nd->depth)
+ put_link(nd);
+
+ error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW);
+ if (unlikely(error)) {
+ if (error < 0)
+ goto out;
+ return error;
}

if (unlikely(d_is_symlink(path.dentry)) && !(open_flag & O_PATH)) {
--
2.1.4

2015-05-05 05:35:00

by Al Viro

[permalink] [raw]
Subject: [PATCH 72/79] namei: move link count check and stack allocation into pick_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 026d7df..9822b37 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -891,16 +891,10 @@ const char *get_link(struct nameidata *nd)
if (nd->link.mnt == nd->path.mnt)
mntget(nd->link.mnt);

- if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
- path_put(&nd->link);
- return ERR_PTR(-ELOOP);
- }
-
last->link = nd->link;
last->cookie = NULL;

cond_resched();
- current->total_link_count++;

touch_atime(&last->link);

@@ -1560,12 +1554,23 @@ static void terminate_walk(struct nameidata *nd)

static int pick_link(struct nameidata *nd, struct path *link)
{
+ int error;
+ if (unlikely(current->total_link_count++ >= MAXSYMLINKS)) {
+ path_to_nameidata(link, nd);
+ return -ELOOP;
+ }
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != link->mnt ||
unlazy_walk(nd, link->dentry))) {
return -ECHILD;
}
}
+ error = nd_alloc_stack(nd);
+ if (unlikely(error)) {
+ path_to_nameidata(link, nd);
+ return error;
+ }
+
nd->link = *link;
return 1;
}
@@ -1832,15 +1837,7 @@ OK:
break;

if (err) {
- const char *s;
-
- err = nd_alloc_stack(nd);
- if (unlikely(err)) {
- path_to_nameidata(&nd->link, nd);
- break;
- }
-
- s = get_link(nd);
+ const char *s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
--
2.1.4

2015-05-05 05:34:43

by Al Viro

[permalink] [raw]
Subject: [PATCH 73/79] lustre: rip the private symlink nesting limit out

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
drivers/staging/lustre/lustre/llite/symlink.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index e488cb3..da6d9d1 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -126,18 +126,9 @@ static const char *ll_follow_link(struct dentry *dentry, void **cookie, struct n
char *symname = NULL;

CDEBUG(D_VFSTRACE, "VFS Op\n");
- /* Limit the recursive symlink depth to 5 instead of default
- * 8 links when kernel has 4k stack to prevent stack overflow.
- * For 8k stacks we need to limit it to 7 for local servers. */
- if (THREAD_SIZE < 8192 && current->link_count >= 6) {
- rc = -ELOOP;
- } else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
- rc = -ELOOP;
- } else {
- ll_inode_size_lock(inode);
- rc = ll_readlink_internal(inode, &request, &symname);
- ll_inode_size_unlock(inode);
- }
+ ll_inode_size_lock(inode);
+ rc = ll_readlink_internal(inode, &request, &symname);
+ ll_inode_size_unlock(inode);
if (rc) {
ptlrpc_req_finished(request);
return ERR_PTR(rc);
--
2.1.4

2015-05-05 05:32:45

by Al Viro

[permalink] [raw]
Subject: [PATCH 74/79] VFS: replace {, total_}link_count in task_struct with pointer to nameidata

From: NeilBrown <[email protected]>

task_struct currently contains two ad-hoc members for use by the VFS:
link_count and total_link_count. These are only interesting to fs/namei.c,
so exposing them explicitly is poor layering. Incidentally, link_count
isn't used anymore, so it can just die.

This patches replaces those with a single pointer to 'struct nameidata'.
This structure represents the current filename lookup of which
there can only be one per process, and is a natural place to
store total_link_count.

This will allow the current "nameidata" argument to all
follow_link operations to be removed as current->nameidata
can be used instead in the _very_ few instances that care about
it at all.

As there are occasional circumstances where pathname lookup can
recurse, such as through kern_path_locked, we always save and old
current->nameidata (if there is one) when setting a new value, and
make sure any active link_counts are preserved.

follow_mount and follow_automount now get a 'struct nameidata *'
rather than 'int flags' so that they can directly access
total_link_count, rather than going through 'current'.

Suggested-by: Al Viro <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 70 ++++++++++++++++++++++++++++-----------------------
include/linux/sched.h | 2 +-
2 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9822b37..5ac1079 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,7 @@ struct nameidata {
unsigned seq, m_seq;
int last_type;
unsigned depth;
+ int total_link_count;
struct file *base;
struct saved {
struct path link;
@@ -513,16 +514,25 @@ struct nameidata {
} *stack, internal[EMBEDDED_LEVELS];
};

-static void set_nameidata(struct nameidata *nd)
+static struct nameidata *set_nameidata(struct nameidata *p)
{
- nd->stack = nd->internal;
+ struct nameidata *old = current->nameidata;
+ p->stack = p->internal;
+ p->total_link_count = old ? old->total_link_count : 0;
+ current->nameidata = p;
+ return old;
}

-static void restore_nameidata(struct nameidata *nd)
+static void restore_nameidata(struct nameidata *old)
{
- if (nd->stack != nd->internal) {
- kfree(nd->stack);
- nd->stack = nd->internal;
+ struct nameidata *now = current->nameidata;
+
+ current->nameidata = old;
+ if (old)
+ old->total_link_count = now->total_link_count;
+ if (now->stack != now->internal) {
+ kfree(now->stack);
+ now->stack = now->internal;
}
}

@@ -970,7 +980,7 @@ EXPORT_SYMBOL(follow_up);
* - return -EISDIR to tell follow_managed() to stop and return the path we
* were called with.
*/
-static int follow_automount(struct path *path, unsigned flags,
+static int follow_automount(struct path *path, struct nameidata *nd,
bool *need_mntput)
{
struct vfsmount *mnt;
@@ -990,13 +1000,13 @@ static int follow_automount(struct path *path, unsigned flags,
* as being automount points. These will need the attentions
* of the daemon to instantiate them before they can be used.
*/
- if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
- LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+ if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+ LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
path->dentry->d_inode)
return -EISDIR;

- current->total_link_count++;
- if (current->total_link_count >= 40)
+ nd->total_link_count++;
+ if (nd->total_link_count >= 40)
return -ELOOP;

mnt = path->dentry->d_op->d_automount(path);
@@ -1010,7 +1020,7 @@ static int follow_automount(struct path *path, unsigned flags,
* the path being looked up; if it wasn't then the remainder of
* the path is inaccessible and we should say so.
*/
- if (PTR_ERR(mnt) == -EISDIR && (flags & LOOKUP_PARENT))
+ if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
return -EREMOTE;
return PTR_ERR(mnt);
}
@@ -1050,7 +1060,7 @@ static int follow_automount(struct path *path, unsigned flags,
*
* Serialization is taken care of in namespace.c
*/
-static int follow_managed(struct path *path, unsigned flags)
+static int follow_managed(struct path *path, struct nameidata *nd)
{
struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
unsigned managed;
@@ -1094,7 +1104,7 @@ static int follow_managed(struct path *path, unsigned flags)

/* Handle an automount point */
if (managed & DCACHE_NEED_AUTOMOUNT) {
- ret = follow_automount(path, flags, &need_mntput);
+ ret = follow_automount(path, nd, &need_mntput);
if (ret < 0)
break;
continue;
@@ -1475,7 +1485,7 @@ unlazy:

path->mnt = mnt;
path->dentry = dentry;
- err = follow_managed(path, nd->flags);
+ err = follow_managed(path, nd);
if (unlikely(err < 0)) {
path_put_conditional(path, nd);
return err;
@@ -1505,7 +1515,7 @@ static int lookup_slow(struct nameidata *nd, struct path *path)
return PTR_ERR(dentry);
path->mnt = nd->path.mnt;
path->dentry = dentry;
- err = follow_managed(path, nd->flags);
+ err = follow_managed(path, nd);
if (unlikely(err < 0)) {
path_put_conditional(path, nd);
return err;
@@ -1555,7 +1565,7 @@ static void terminate_walk(struct nameidata *nd)
static int pick_link(struct nameidata *nd, struct path *link)
{
int error;
- if (unlikely(current->total_link_count++ >= MAXSYMLINKS)) {
+ if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
path_to_nameidata(link, nd);
return -ELOOP;
}
@@ -1973,7 +1983,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
rcu_read_unlock();
return -ECHILD;
done:
- current->total_link_count = 0;
+ nd->total_link_count = 0;
return link_path_walk(s, nd);
}

@@ -2079,10 +2089,9 @@ static int filename_lookup(int dfd, struct filename *name,
unsigned int flags, struct nameidata *nd)
{
int retval;
+ struct nameidata *saved_nd = set_nameidata(nd);

- set_nameidata(nd);
retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
-
if (unlikely(retval == -ECHILD))
retval = path_lookupat(dfd, name, flags, nd);
if (unlikely(retval == -ESTALE))
@@ -2090,7 +2099,7 @@ static int filename_lookup(int dfd, struct filename *name,

if (likely(!retval))
audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
- restore_nameidata(nd);
+ restore_nameidata(saved_nd);
return retval;
}

@@ -2418,11 +2427,11 @@ static int
filename_mountpoint(int dfd, struct filename *name, struct path *path,
unsigned int flags)
{
- struct nameidata nd;
+ struct nameidata nd, *saved;
int error;
if (IS_ERR(name))
return PTR_ERR(name);
- set_nameidata(&nd);
+ saved = set_nameidata(&nd);
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
if (unlikely(error == -ECHILD))
error = path_mountpoint(dfd, name, path, &nd, flags);
@@ -2430,7 +2439,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
if (likely(!error))
audit_inode(name, path->dentry, 0);
- restore_nameidata(&nd);
+ restore_nameidata(saved);
putname(name);
return error;
}
@@ -3079,7 +3088,7 @@ retry_lookup:
if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
goto exit_dput;

- error = follow_managed(&path, nd->flags);
+ error = follow_managed(&path, nd);
if (error < 0)
goto exit_dput;

@@ -3317,31 +3326,29 @@ out:
struct file *do_filp_open(int dfd, struct filename *pathname,
const struct open_flags *op)
{
- struct nameidata nd;
+ struct nameidata nd, *saved_nd = set_nameidata(&nd);
int flags = op->lookup_flags;
struct file *filp;

- set_nameidata(&nd);
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
filp = path_openat(dfd, pathname, &nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_REVAL);
- restore_nameidata(&nd);
+ restore_nameidata(saved_nd);
return filp;
}

struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
const char *name, const struct open_flags *op)
{
- struct nameidata nd;
+ struct nameidata nd, *saved_nd;
struct file *file;
struct filename *filename;
int flags = op->lookup_flags | LOOKUP_ROOT;

nd.root.mnt = mnt;
nd.root.dentry = dentry;
- set_nameidata(&nd);

if (d_is_symlink(dentry) && op->intent & LOOKUP_OPEN)
return ERR_PTR(-ELOOP);
@@ -3350,12 +3357,13 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
if (unlikely(IS_ERR(filename)))
return ERR_CAST(filename);

+ saved_nd = set_nameidata(&nd);
file = path_openat(-1, filename, &nd, op, flags | LOOKUP_RCU);
if (unlikely(file == ERR_PTR(-ECHILD)))
file = path_openat(-1, filename, &nd, op, flags);
if (unlikely(file == ERR_PTR(-ESTALE)))
file = path_openat(-1, filename, &nd, op, flags | LOOKUP_REVAL);
- restore_nameidata(&nd);
+ restore_nameidata(saved_nd);
putname(filename);
return file;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8222ae4..cd24315 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1469,7 +1469,7 @@ struct task_struct {
it with task_lock())
- initialized normally by setup_new_exec */
/* file system info */
- int link_count, total_link_count;
+ struct nameidata *nameidata;
#ifdef CONFIG_SYSVIPC
/* ipc stuff */
struct sysv_sem sysvsem;
--
2.1.4

2015-05-05 05:32:56

by Al Viro

[permalink] [raw]
Subject: [PATCH 75/79] namei: simplify the callers of follow_managed()

From: Al Viro <[email protected]>

now that it gets nameidata, no reason to have setting LOOKUP_JUMPED on
mountpoint crossing and calling path_put_conditional() on failures
done in every caller.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 32 ++++++++++----------------------
1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5ac1079..c9c313c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1118,7 +1118,11 @@ static int follow_managed(struct path *path, struct nameidata *nd)
mntput(path->mnt);
if (ret == -EISDIR)
ret = 0;
- return ret < 0 ? ret : need_mntput;
+ if (need_mntput)
+ nd->flags |= LOOKUP_JUMPED;
+ if (unlikely(ret < 0))
+ path_put_conditional(path, nd);
+ return ret;
}

int follow_down_one(struct path *path)
@@ -1486,14 +1490,9 @@ unlazy:
path->mnt = mnt;
path->dentry = dentry;
err = follow_managed(path, nd);
- if (unlikely(err < 0)) {
- path_put_conditional(path, nd);
- return err;
- }
- if (err)
- nd->flags |= LOOKUP_JUMPED;
- *inode = path->dentry->d_inode;
- return 0;
+ if (likely(!err))
+ *inode = path->dentry->d_inode;
+ return err;

need_lookup:
return 1;
@@ -1503,7 +1502,6 @@ need_lookup:
static int lookup_slow(struct nameidata *nd, struct path *path)
{
struct dentry *dentry, *parent;
- int err;

parent = nd->path.dentry;
BUG_ON(nd->inode != parent->d_inode);
@@ -1515,14 +1513,7 @@ static int lookup_slow(struct nameidata *nd, struct path *path)
return PTR_ERR(dentry);
path->mnt = nd->path.mnt;
path->dentry = dentry;
- err = follow_managed(path, nd);
- if (unlikely(err < 0)) {
- path_put_conditional(path, nd);
- return err;
- }
- if (err)
- nd->flags |= LOOKUP_JUMPED;
- return 0;
+ return follow_managed(path, nd);
}

static inline int may_lookup(struct nameidata *nd)
@@ -3090,10 +3081,7 @@ retry_lookup:

error = follow_managed(&path, nd);
if (error < 0)
- goto exit_dput;
-
- if (error)
- nd->flags |= LOOKUP_JUMPED;
+ goto out;

BUG_ON(nd->flags & LOOKUP_RCU);
inode = path.dentry->d_inode;
--
2.1.4

2015-05-05 05:32:27

by Al Viro

[permalink] [raw]
Subject: [PATCH 76/79] don't pass nameidata to ->follow_link()

From: Al Viro <[email protected]>

its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata

Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/Locking | 2 +-
Documentation/filesystems/vfs.txt | 2 +-
drivers/staging/lustre/lustre/llite/symlink.c | 2 +-
fs/9p/vfs_inode.c | 2 +-
fs/9p/vfs_inode_dotl.c | 2 +-
fs/autofs4/symlink.c | 2 +-
fs/befs/linuxvfs.c | 4 ++--
fs/cifs/cifsfs.h | 2 +-
fs/cifs/link.c | 2 +-
fs/configfs/symlink.c | 2 +-
fs/ecryptfs/inode.c | 2 +-
fs/ext4/symlink.c | 2 +-
fs/f2fs/namei.c | 4 ++--
fs/fuse/dir.c | 2 +-
fs/gfs2/inode.c | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hppfs/hppfs.c | 4 ++--
fs/kernfs/symlink.c | 2 +-
fs/libfs.c | 2 +-
fs/namei.c | 11 ++++++-----
fs/nfs/symlink.c | 2 +-
fs/overlayfs/inode.c | 4 ++--
fs/proc/base.c | 4 ++--
fs/proc/inode.c | 2 +-
fs/proc/namespaces.c | 4 ++--
fs/proc/self.c | 2 +-
fs/proc/thread_self.c | 2 +-
fs/xfs/xfs_iops.c | 3 +--
include/linux/fs.h | 6 +++---
include/linux/namei.h | 2 +-
mm/shmem.c | 2 +-
31 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 7fa6c4a..5b5b4f5 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,7 +50,7 @@ prototypes:
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ const char *(*follow_link) (struct dentry *, void **);
void (*put_link) (struct dentry *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 1c6b03a..0dec8c8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,7 +350,7 @@ struct inode_operations {
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ const char *(*follow_link) (struct dentry *, void **);
void (*put_link) (struct dentry *, void *);
int (*permission) (struct inode *, int);
int (*get_acl)(struct inode *, int);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index da6d9d1..f3be3bf 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
return rc;
}

-static const char *ll_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ll_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct ptlrpc_request *request = NULL;
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 7cc70a3..271f51a 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1230,7 +1230,7 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
*
*/

-static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie)
{
struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
struct p9_fid *fid = v9fs_fid_lookup(dentry);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index ae062ff..16658ed 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -910,7 +910,7 @@ error:
*/

static const char *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, void **cookie, struct nameidata *nd)
+v9fs_vfs_follow_link_dotl(struct dentry *dentry, void **cookie)
{
struct p9_fid *fid = v9fs_fid_lookup(dentry);
char *target;
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index 9c6a077..da0c334 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,7 +12,7 @@

#include "autofs_i.h"

-static const char *autofs4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *autofs4_follow_link(struct dentry *dentry, void **cookie)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 3a1aefb..46aedac 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,7 +42,7 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
static struct inode *befs_alloc_inode(struct super_block *sb);
static void befs_destroy_inode(struct inode *inode);
static void befs_destroy_inodecache(void);
-static const char *befs_follow_link(struct dentry *, void **, struct nameidata *nd);
+static const char *befs_follow_link(struct dentry *, void **);
static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -464,7 +464,7 @@ befs_destroy_inodecache(void)
* flag is set.
*/
static const char *
-befs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+befs_follow_link(struct dentry *dentry, void **cookie)
{
struct super_block *sb = dentry->d_sb;
struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 61012da..a782b22 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
#endif

/* Functions related to symlinks */
-extern const char *cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd);
+extern const char *cifs_follow_link(struct dentry *direntry, void **cookie);
extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
int buflen);
extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 4a439c2..546f86a 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -627,7 +627,7 @@ cifs_hl_exit:
}

const char *
-cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd)
+cifs_follow_link(struct dentry *direntry, void **cookie)
{
struct inode *inode = d_inode(direntry);
int rc = -ENOMEM;
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index fac8e85..0ace756 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,7 +279,7 @@ static int configfs_getlink(struct dentry *dentry, char * path)

}

-static const char *configfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *configfs_follow_link(struct dentry *dentry, void **cookie)
{
unsigned long page = get_zeroed_page(GFP_KERNEL);
int error;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index cdb9d6c..73d20ae 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -675,7 +675,7 @@ out:
return rc ? ERR_PTR(rc) : buf;
}

-static const char *ecryptfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ecryptfs_follow_link(struct dentry *dentry, void **cookie)
{
size_t len;
char *buf = ecryptfs_readlink_lower(dentry, &len);
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index f4c1c73..50494c4 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,7 +23,7 @@
#include "xattr.h"

#ifdef CONFIG_EXT4_FS_ENCRYPTION
-static const char *ext4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ext4_follow_link(struct dentry *dentry, void **cookie)
{
struct page *cpage = NULL;
char *caddr, *paddr = NULL;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index d294793..cd05a7c 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -296,9 +296,9 @@ fail:
return err;
}

-static const char *f2fs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *f2fs_follow_link(struct dentry *dentry, void **cookie)
{
- const char *link = page_follow_link_light(dentry, cookie, nd);
+ const char *link = page_follow_link_light(dentry, cookie);
if (!IS_ERR(link) && !*link) {
/* this is broken symlink case */
page_put_link(dentry, *cookie);
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f9cb260..d5cdef8 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1365,7 +1365,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
return err;
}

-static const char *fuse_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *fuse_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index f59390a..3a1461d 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1548,7 +1548,7 @@ out:
* Returns: 0 on success or error code
*/

-static const char *gfs2_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *gfs2_follow_link(struct dentry *dentry, void **cookie)
{
struct gfs2_inode *ip = GFS2_I(d_inode(dentry));
struct gfs2_holder i_gh;
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index f650ed6..7b6ed7a 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -892,7 +892,7 @@ static const struct inode_operations hostfs_dir_iops = {
.setattr = hostfs_setattr,
};

-static const char *hostfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *hostfs_follow_link(struct dentry *dentry, void **cookie)
{
char *link = __getname();
if (link) {
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index b8f24d3..15a774e 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,11 +642,11 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
buflen);
}

-static const char *hppfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *hppfs_follow_link(struct dentry *dentry, void **cookie)
{
struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

- return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie, nd);
+ return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie);
}

static void hppfs_put_link(struct dentry *dentry, void *cookie)
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 3c7e799..366c5a1 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,7 +112,7 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
return error;
}

-static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie)
{
int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
diff --git a/fs/libfs.c b/fs/libfs.c
index 0c83fde..c5f3373 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1091,7 +1091,7 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
}
EXPORT_SYMBOL(simple_nosetlease);

-const char *simple_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+const char *simple_follow_link(struct dentry *dentry, void **cookie)
{
return d_inode(dentry)->i_link;
}
diff --git a/fs/namei.c b/fs/namei.c
index c9c313c..98e932c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -753,8 +753,9 @@ static inline void path_to_nameidata(const struct path *path,
* Helper to directly jump to a known parsed path from ->follow_link,
* caller must have taken a reference to path beforehand.
*/
-void nd_jump_link(struct nameidata *nd, struct path *path)
+void nd_jump_link(struct path *path)
{
+ struct nameidata *nd = current->nameidata;
path_put(&nd->path);

nd->path = *path;
@@ -916,7 +917,7 @@ const char *get_link(struct nameidata *nd)
nd->last_type = LAST_BIND;
res = inode->i_link;
if (!res) {
- res = inode->i_op->follow_link(dentry, &last->cookie, nd);
+ res = inode->i_op->follow_link(dentry, &last->cookie);
if (IS_ERR(res)) {
out:
path_put(&last->link);
@@ -4479,12 +4480,12 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
int res;

if (!link) {
- link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
+ link = dentry->d_inode->i_op->follow_link(dentry, &cookie);
if (IS_ERR(link))
return PTR_ERR(link);
}
res = readlink_copy(buffer, buflen, link);
- if (cookie && dentry->d_inode->i_op->put_link)
+ if (dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, cookie);
return res;
}
@@ -4517,7 +4518,7 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
}
EXPORT_SYMBOL(page_readlink);

-const char *page_follow_link_light(struct dentry *dentry, void **cookie, struct nameidata *nd)
+const char *page_follow_link_light(struct dentry *dentry, void **cookie)
{
struct page *page = NULL;
char *res = page_getlink(dentry, &page);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index c992b20..b6de433 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -42,7 +42,7 @@ error:
return -EIO;
}

-static const char *nfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *nfs_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct page *page;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 235ad42..9986833 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,7 +140,7 @@ struct ovl_link_data {
void *cookie;
};

-static const char *ovl_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ovl_follow_link(struct dentry *dentry, void **cookie)
{
struct dentry *realdentry;
struct inode *realinode;
@@ -160,7 +160,7 @@ static const char *ovl_follow_link(struct dentry *dentry, void **cookie, struct
data->realdentry = realdentry;
}

- ret = realinode->i_op->follow_link(realdentry, cookie, nd);
+ ret = realinode->i_op->follow_link(realdentry, cookie);
if (IS_ERR_OR_NULL(ret)) {
kfree(data);
return ret;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 52652f8..286a422 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1380,7 +1380,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
return -ENOENT;
}

-static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct path path;
@@ -1394,7 +1394,7 @@ static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie, st
if (error)
goto out;

- nd_jump_link(nd, &path);
+ nd_jump_link(&path);
return NULL;
out:
return ERR_PTR(error);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index acd51d7..eb35874 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -393,7 +393,7 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
};
#endif

-static const char *proc_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_follow_link(struct dentry *dentry, void **cookie)
{
struct proc_dir_entry *pde = PDE(d_inode(dentry));
if (unlikely(!use_pde(pde)))
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 10d24dd..f6e8354 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
&mntns_operations,
};

-static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
@@ -45,7 +45,7 @@ static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie, str
if (ptrace_may_access(task, PTRACE_MODE_READ)) {
error = ns_get_path(&ns_path, task, ns_ops);
if (!error)
- nd_jump_link(nd, &ns_path);
+ nd_jump_link(&ns_path);
}
put_task_struct(task);
return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index ad33394..113b8d0 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -18,7 +18,7 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static const char *proc_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_self_follow_link(struct dentry *dentry, void **cookie)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 85c96e0..947b0f4 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -19,7 +19,7 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static const char *proc_thread_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_thread_self_follow_link(struct dentry *dentry, void **cookie)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 26c4dcb..7f51f39 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -416,8 +416,7 @@ xfs_vn_rename(
STATIC const char *
xfs_vn_follow_link(
struct dentry *dentry,
- void **cookie,
- struct nameidata *nd)
+ void **cookie)
{
char *link;
int error = -ENOMEM;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab9341..ed7c9f2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1608,7 +1608,7 @@ struct file_operations {

struct inode_operations {
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
- const char * (*follow_link) (struct dentry *, void **, struct nameidata *);
+ const char * (*follow_link) (struct dentry *, void **);
int (*permission) (struct inode *, int);
struct posix_acl * (*get_acl)(struct inode *, int);

@@ -2705,7 +2705,7 @@ extern const struct file_operations generic_ro_fops;

extern int readlink_copy(char __user *, int, const char *);
extern int page_readlink(struct dentry *, char __user *, int);
-extern const char *page_follow_link_light(struct dentry *, void **, struct nameidata *);
+extern const char *page_follow_link_light(struct dentry *, void **);
extern void page_put_link(struct dentry *, void *);
extern int __page_symlink(struct inode *inode, const char *symname, int len,
int nofs);
@@ -2722,7 +2722,7 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);
-const char *simple_follow_link(struct dentry *, void **, struct nameidata *);
+const char *simple_follow_link(struct dentry *, void **);
extern const struct inode_operations simple_symlink_inode_operations;

extern int iterate_dir(struct file *, struct dir_context *);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 3a6cc96..d756304 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -72,7 +72,7 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);

-extern void nd_jump_link(struct nameidata *nd, struct path *path);
+extern void nd_jump_link(struct path *path);

static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
diff --git a/mm/shmem.c b/mm/shmem.c
index d1693dc..e026822 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2475,7 +2475,7 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return 0;
}

-static const char *shmem_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *shmem_follow_link(struct dentry *dentry, void **cookie)
{
struct page *page = NULL;
int error = shmem_getpage(d_inode(dentry), 0, &page, SGP_READ, NULL);
--
2.1.4

2015-05-05 05:33:05

by Al Viro

[permalink] [raw]
Subject: [PATCH 77/79] namei: move unlazying on symlinks towards get_link() call sites

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 45 +++++++++++++++++++++++++++++++++------------
1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 98e932c..eb3f7d9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -538,10 +538,18 @@ static void restore_nameidata(struct nameidata *old)

static int __nd_alloc_stack(struct nameidata *nd)
{
- struct saved *p = kmalloc(MAXSYMLINKS * sizeof(struct saved),
- GFP_KERNEL);
- if (unlikely(!p))
- return -ENOMEM;
+ struct saved *p;
+ if (nd->flags & LOOKUP_RCU) {
+ p = kmalloc(MAXSYMLINKS * sizeof(struct saved),
+ GFP_ATOMIC);
+ if (unlikely(!p))
+ return -ECHILD;
+ } else {
+ p = kmalloc(MAXSYMLINKS * sizeof(struct saved),
+ GFP_KERNEL);
+ if (unlikely(!p))
+ return -ENOMEM;
+ }
memcpy(p, nd->internal, sizeof(nd->internal));
nd->stack = p;
return 0;
@@ -1561,12 +1569,6 @@ static int pick_link(struct nameidata *nd, struct path *link)
path_to_nameidata(link, nd);
return -ELOOP;
}
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != link->mnt ||
- unlazy_walk(nd, link->dentry))) {
- return -ECHILD;
- }
- }
error = nd_alloc_stack(nd);
if (unlikely(error)) {
path_to_nameidata(link, nd);
@@ -1839,7 +1841,17 @@ OK:
break;

if (err) {
- const char *s = get_link(nd);
+ const char *s;
+
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(nd->link.mnt != nd->path.mnt ||
+ unlazy_walk(nd, nd->link.dentry))) {
+ err = -ECHILD;
+ break;
+ }
+ }
+
+ s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
@@ -1992,7 +2004,16 @@ static void path_cleanup(struct nameidata *nd)
static int trailing_symlink(struct nameidata *nd)
{
const char *s;
- int error = may_follow_link(&nd->link, nd);
+ int error;
+
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(nd->link.mnt != nd->path.mnt ||
+ unlazy_walk(nd, nd->link.dentry))) {
+ terminate_walk(nd);
+ return -ECHILD;
+ }
+ }
+ error = may_follow_link(&nd->link, nd);
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
--
2.1.4

2015-05-05 05:32:17

by Al Viro

[permalink] [raw]
Subject: [PATCH 78/79] namei: new helper - is_stale()

From: Al Viro <[email protected]>

we have a lot of places where we open-code the check that dentry->d_seq
matches nameidata...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index eb3f7d9..cf8f2de 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -575,6 +575,11 @@ static inline int nd_alloc_stack(struct nameidata *nd)
* to restart the path walk from the beginning in ref-walk mode.
*/

+static inline bool is_stale(struct nameidata *nd, struct dentry *dentry)
+{
+ return read_seqcount_retry(&dentry->d_seq, nd->seq);
+}
+
/**
* unlazy_walk - try to switch to ref-walk mode.
* @nd: nameidata pathwalk data
@@ -621,13 +626,13 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
* be valid if the child sequence number is still valid.
*/
if (!dentry) {
- if (read_seqcount_retry(&parent->d_seq, nd->seq))
+ if (is_stale(nd, parent))
goto out;
BUG_ON(nd->inode != parent->d_inode);
} else {
if (!lockref_get_not_dead(&dentry->d_lockref))
goto out;
- if (read_seqcount_retry(&dentry->d_seq, nd->seq))
+ if (is_stale(nd, dentry))
goto drop_dentry;
}

@@ -694,7 +699,7 @@ static int complete_walk(struct nameidata *nd)
mntput(nd->path.mnt);
return -ECHILD;
}
- if (read_seqcount_retry(&dentry->d_seq, nd->seq)) {
+ if (is_stale(nd, dentry)) {
rcu_read_unlock();
dput(dentry);
mntput(nd->path.mnt);
@@ -1218,7 +1223,7 @@ static int follow_dotdot_rcu(struct nameidata *nd)

inode = parent->d_inode;
seq = read_seqcount_begin(&parent->d_seq);
- if (read_seqcount_retry(&old->d_seq, nd->seq))
+ if (is_stale(nd, old))
goto failed;
nd->path.dentry = parent;
nd->seq = seq;
@@ -1980,7 +1985,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
nd->inode = nd->path.dentry->d_inode;
if (!(flags & LOOKUP_RCU))
goto done;
- if (likely(!read_seqcount_retry(&nd->path.dentry->d_seq, nd->seq)))
+ if (likely(!is_stale(nd, nd->path.dentry)))
goto done;
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
--
2.1.4

2015-05-05 05:32:35

by Al Viro

[permalink] [raw]
Subject: [PATCH 79/79] namei: stretch RCU mode into get_link()

From: Al Viro <[email protected]>

one potentially subtle point: may_follow_link() kludge rejecting symlink
traversal in RCU mode is handled by returning -ECHILD and repeating
everything in non-RCU mode. Reason: audit_log_link_denied() won't work
in RCU mode.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 86 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cf8f2de..761d4b1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -725,6 +725,29 @@ static int complete_walk(struct nameidata *nd)
return status;
}

+static inline void put_link(struct nameidata *nd)
+{
+ struct saved *last = nd->stack + --nd->depth;
+ struct inode *inode = last->link.dentry->d_inode;
+ if (last->cookie && inode->i_op->put_link)
+ inode->i_op->put_link(last->link.dentry, last->cookie);
+ path_put(&last->link);
+}
+
+static void terminate_walk(struct nameidata *nd)
+{
+ if (!(nd->flags & LOOKUP_RCU)) {
+ path_put(&nd->path);
+ } else {
+ nd->flags &= ~LOOKUP_RCU;
+ if (!(nd->flags & LOOKUP_ROOT))
+ nd->root.mnt = NULL;
+ rcu_read_unlock();
+ }
+ while (unlikely(nd->depth))
+ put_link(nd);
+}
+
static __always_inline void set_root(struct nameidata *nd)
{
get_fs_root(current->fs, &nd->root);
@@ -776,15 +799,6 @@ void nd_jump_link(struct path *path)
nd->flags |= LOOKUP_JUMPED;
}

-static inline void put_link(struct nameidata *nd)
-{
- struct saved *last = nd->stack + --nd->depth;
- struct inode *inode = last->link.dentry->d_inode;
- if (last->cookie && inode->i_op->put_link)
- inode->i_op->put_link(last->link.dentry, last->cookie);
- path_put(&last->link);
-}
-
int sysctl_protected_symlinks __read_mostly = 0;
int sysctl_protected_hardlinks __read_mostly = 0;

@@ -804,21 +818,20 @@ int sysctl_protected_hardlinks __read_mostly = 0;
*
* Returns 0 if following the symlink is allowed, -ve on error.
*/
-static inline int may_follow_link(struct path *link, struct nameidata *nd)
+static inline int may_follow_link(struct path *link, const struct inode *inode,
+ struct nameidata *nd)
{
- const struct inode *inode;
const struct inode *parent;

if (!sysctl_protected_symlinks)
return 0;

/* Allowed if owner and follower match. */
- inode = link->dentry->d_inode;
if (uid_eq(current_cred()->fsuid, inode->i_uid))
return 0;

/* Allowed if parent directory not sticky and world-writable. */
- parent = nd->path.dentry->d_inode;
+ parent = nd->inode;
if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
return 0;

@@ -826,9 +839,14 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
if (uid_eq(parent->i_uid, inode->i_uid))
return 0;

+ if (nd->flags & LOOKUP_RCU) {
+ terminate_walk(nd);
+ return -ECHILD;
+ }
+
audit_log_link_denied("follow_link", link);
- path_put_conditional(link, nd);
- path_put(&nd->path);
+ path_to_nameidata(link, nd);
+ terminate_walk(nd);
return -EACCES;
}

@@ -902,15 +920,19 @@ static int may_linkat(struct path *link)
}

static __always_inline
-const char *get_link(struct nameidata *nd)
+const char *get_link(struct nameidata *nd, struct inode *inode)
{
struct saved *last = nd->stack + nd->depth;
struct dentry *dentry = nd->link.dentry;
- struct inode *inode = dentry->d_inode;
int error;
const char *res;

- BUG_ON(nd->flags & LOOKUP_RCU);
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(nd->link.mnt != nd->path.mnt ||
+ unlazy_walk(nd, dentry))) {
+ return ERR_PTR(-ECHILD);
+ }
+ }

if (nd->link.mnt == nd->path.mnt)
mntget(nd->link.mnt);
@@ -1553,20 +1575,6 @@ static inline int handle_dots(struct nameidata *nd, int type)
return 0;
}

-static void terminate_walk(struct nameidata *nd)
-{
- if (!(nd->flags & LOOKUP_RCU)) {
- path_put(&nd->path);
- } else {
- nd->flags &= ~LOOKUP_RCU;
- if (!(nd->flags & LOOKUP_ROOT))
- nd->root.mnt = NULL;
- rcu_read_unlock();
- }
- while (unlikely(nd->depth))
- put_link(nd);
-}
-
static int pick_link(struct nameidata *nd, struct path *link)
{
int error;
@@ -1847,16 +1855,16 @@ OK:

if (err) {
const char *s;
+ struct inode *inode = nd->link.dentry->d_inode;

if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->link.mnt != nd->path.mnt ||
- unlazy_walk(nd, nd->link.dentry))) {
+ if (unlikely(is_stale(nd, nd->link.dentry))) {
err = -ECHILD;
break;
}
}

- s = get_link(nd);
+ s = get_link(nd, inode);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
@@ -2009,20 +2017,20 @@ static void path_cleanup(struct nameidata *nd)
static int trailing_symlink(struct nameidata *nd)
{
const char *s;
+ struct inode *inode = nd->link.dentry->d_inode;
int error;

if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->link.mnt != nd->path.mnt ||
- unlazy_walk(nd, nd->link.dentry))) {
+ if (unlikely(is_stale(nd, nd->link.dentry))) {
terminate_walk(nd);
return -ECHILD;
}
}
- error = may_follow_link(&nd->link, nd);
+ error = may_follow_link(&nd->link, inode, nd);
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
- s = get_link(nd);
+ s = get_link(nd, inode);
if (unlikely(IS_ERR(s))) {
terminate_walk(nd);
return PTR_ERR(s);
--
2.1.4

2015-05-05 07:13:24

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH 03/79] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link

>
> From: NeilBrown <[email protected]>
>
> ovl_follow_link current calls ->put_link on an error path.
> However ->put_link is about to change in a way that it will be
> impossible to call it from ovl_follow_link.
>
> So rearrange the code to avoid the need for that error path.
> Specifically: move the kmalloc() call before the ->follow_link()
> call to the subordinate filesystem.
>
> Signed-off-by: NeilBrown <[email protected]>
> Signed-off-by: Al Viro <[email protected]>
> ---
> fs/overlayfs/inode.c | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 04f1248..1b4b9c5e 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -145,6 +145,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
> void *ret;
> struct dentry *realdentry;
> struct inode *realinode;
> + struct ovl_link_data *data = NULL;
>
> realdentry = ovl_dentry_real(dentry);
> realinode = realdentry->d_inode;
> @@ -152,25 +153,23 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
> if (WARN_ON(!realinode->i_op->follow_link))
> return ERR_PTR(-EPERM);
>
> - ret = realinode->i_op->follow_link(realdentry, nd);
> - if (IS_ERR(ret))
> - return ret;
> -
> if (realinode->i_op->put_link) {
> - struct ovl_link_data *data;
> -
> data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
> - if (!data) {
> - realinode->i_op->put_link(realdentry, nd, ret);
> + if (!data)
> return ERR_PTR(-ENOMEM);
> - }
> data->realdentry = realdentry;
> - data->cookie = ret;
> + }
>
> - return data;
> - } else {
> - return NULL;
> + ret = realinode->i_op->follow_link(realdentry, nd);
> + if (IS_ERR(ret)) {
> + kfree(data);
> + return ret;
> }
> +
> + if (data)

No need to check again(see the above kfree(data) please).

> + data->cookie = ret;
> +
> + return data;
> }
>
> static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
> --
> 2.1.4

2015-05-05 08:34:25

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 03/79] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link

On Tue, 05 May 2015 15:12:28 +0800 "Hillf Danton" <[email protected]>
wrote:

> >
> > From: NeilBrown <[email protected]>
> >
> > ovl_follow_link current calls ->put_link on an error path.
> > However ->put_link is about to change in a way that it will be
> > impossible to call it from ovl_follow_link.
> >
> > So rearrange the code to avoid the need for that error path.
> > Specifically: move the kmalloc() call before the ->follow_link()
> > call to the subordinate filesystem.
> >
> > Signed-off-by: NeilBrown <[email protected]>
> > Signed-off-by: Al Viro <[email protected]>
> > ---
> > fs/overlayfs/inode.c | 25 ++++++++++++-------------
> > 1 file changed, 12 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index 04f1248..1b4b9c5e 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -145,6 +145,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
> > void *ret;
> > struct dentry *realdentry;
> > struct inode *realinode;
> > + struct ovl_link_data *data = NULL;
> >
> > realdentry = ovl_dentry_real(dentry);
> > realinode = realdentry->d_inode;
> > @@ -152,25 +153,23 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
> > if (WARN_ON(!realinode->i_op->follow_link))
> > return ERR_PTR(-EPERM);
> >
> > - ret = realinode->i_op->follow_link(realdentry, nd);
> > - if (IS_ERR(ret))
> > - return ret;
> > -
> > if (realinode->i_op->put_link) {
> > - struct ovl_link_data *data;
> > -
> > data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
> > - if (!data) {
> > - realinode->i_op->put_link(realdentry, nd, ret);
> > + if (!data)
> > return ERR_PTR(-ENOMEM);
> > - }
> > data->realdentry = realdentry;
> > - data->cookie = ret;
> > + }
> >
> > - return data;
> > - } else {
> > - return NULL;
> > + ret = realinode->i_op->follow_link(realdentry, nd);
> > + if (IS_ERR(ret)) {
> > + kfree(data);
> > + return ret;
> > }
> > +
> > + if (data)
>
> No need to check again(see the above kfree(data) please).

kfree(NULL) is perfectly valid.

NeilBrown


>
> > + data->cookie = ret;
> > +
> > + return data;
> > }
> >
> > static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
> > --
> > 2.1.4


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-05 09:10:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/79] ext4: split inode_operations for encrypted symlinks off the rest

On Tue 05-05-15 06:21:38, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/ext4.h | 1 +
> fs/ext4/inode.c | 10 ++++++++--
> fs/ext4/namei.c | 9 +++++----
> fs/ext4/symlink.c | 30 ++++++++++--------------------
> 4 files changed, 24 insertions(+), 26 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index ef267ad..2640913 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2849,6 +2849,7 @@ extern int ext4_mpage_readpages(struct address_space *mapping,
> unsigned nr_pages);
>
> /* symlink.c */
> +extern const struct inode_operations ext4_encrypted_symlink_inode_operations;
> extern const struct inode_operations ext4_symlink_inode_operations;
> extern const struct inode_operations ext4_fast_symlink_inode_operations;
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cbd0654..a320558 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4211,8 +4211,14 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> inode->i_op = &ext4_dir_inode_operations;
> inode->i_fop = &ext4_dir_operations;
> } else if (S_ISLNK(inode->i_mode)) {
> - if (ext4_inode_is_fast_symlink(inode) &&
> - !ext4_encrypted_inode(inode)) {
> + if (ext4_encrypted_inode(inode)) {
> +#ifdef CONFIG_EXT4_FS_ENCRYPTION
> + inode->i_op = &ext4_encrypted_symlink_inode_operations;
> + ext4_set_aops(inode);
> +#else
> + BUILD_BUG();
> +#endif
> + } else if (ext4_inode_is_fast_symlink(inode)) {
> inode->i_op = &ext4_fast_symlink_inode_operations;
> nd_terminate_link(ei->i_data, inode->i_size,
> sizeof(ei->i_data) - 1);
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 7223b0b..84b1920 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3266,10 +3266,12 @@ static int ext4_symlink(struct inode *dir,
> goto err_drop_inode;
> sd->len = cpu_to_le16(ostr.len);
> disk_link.name = (char *) sd;
> + inode->i_op = &ext4_encrypted_symlink_inode_operations;
> }
>
> if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
> - inode->i_op = &ext4_symlink_inode_operations;
> + if (!encryption_required)
> + inode->i_op = &ext4_symlink_inode_operations;
> ext4_set_aops(inode);
> /*
> * We cannot call page_symlink() with transaction started
> @@ -3309,9 +3311,8 @@ static int ext4_symlink(struct inode *dir,
> } else {
> /* clear the extent format for fast symlink */
> ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
> - inode->i_op = encryption_required ?
> - &ext4_symlink_inode_operations :
> - &ext4_fast_symlink_inode_operations;
> + if (!encryption_required)
> + inode->i_op = &ext4_fast_symlink_inode_operations;
> memcpy((char *)&EXT4_I(inode)->i_data, disk_link.name,
> disk_link.len);
> inode->i_size = disk_link.len - 1;
> diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
> index 19f78f2..4ea5a01 100644
> --- a/fs/ext4/symlink.c
> +++ b/fs/ext4/symlink.c
> @@ -35,9 +35,6 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
> int res;
> u32 plen, max_size = inode->i_sb->s_blocksize;
>
> - if (!ext4_encrypted_inode(inode))
> - return page_follow_link_light(dentry, nd);
> -
> ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize);
> if (IS_ERR(ctx))
> return ctx;
> @@ -97,18 +94,16 @@ errout:
> return ERR_PTR(res);
> }
>
> -static void ext4_put_link(struct dentry *dentry, struct nameidata *nd,
> - void *cookie)
> -{
> - struct page *page = cookie;
> -
> - if (!page) {
> - kfree(nd_get_link(nd));
> - } else {
> - kunmap(page);
> - page_cache_release(page);
> - }
> -}
> +const struct inode_operations ext4_encrypted_symlink_inode_operations = {
> + .readlink = generic_readlink,
> + .follow_link = ext4_follow_link,
> + .put_link = kfree_put_link,
> + .setattr = ext4_setattr,
> + .setxattr = generic_setxattr,
> + .getxattr = generic_getxattr,
> + .listxattr = ext4_listxattr,
> + .removexattr = generic_removexattr,
> +};
> #endif
>
> static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
> @@ -120,13 +115,8 @@ static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
>
> const struct inode_operations ext4_symlink_inode_operations = {
> .readlink = generic_readlink,
> -#ifdef CONFIG_EXT4_FS_ENCRYPTION
> - .follow_link = ext4_follow_link,
> - .put_link = ext4_put_link,
> -#else
> .follow_link = page_follow_link_light,
> .put_link = page_put_link,
> -#endif
> .setattr = ext4_setattr,
> .setxattr = generic_setxattr,
> .getxattr = generic_getxattr,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:17:58

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 06/79] ext2: use simple_follow_link()

On Tue 05-05-15 06:21:40, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext2/inode.c | 1 +
> fs/ext2/namei.c | 3 ++-
> fs/ext2/symlink.c | 10 +---------
> 3 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index f460ae3..5c09776 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -1403,6 +1403,7 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
> inode->i_mapping->a_ops = &ext2_aops;
> } else if (S_ISLNK(inode->i_mode)) {
> if (ext2_inode_is_fast_symlink(inode)) {
> + inode->i_link = (char *)ei->i_data;
> inode->i_op = &ext2_fast_symlink_inode_operations;
> nd_terminate_link(ei->i_data, inode->i_size,
> sizeof(ei->i_data) - 1);
> diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
> index 3e074a9..13ec54a 100644
> --- a/fs/ext2/namei.c
> +++ b/fs/ext2/namei.c
> @@ -189,7 +189,8 @@ static int ext2_symlink (struct inode * dir, struct dentry * dentry,
> } else {
> /* fast symlink */
> inode->i_op = &ext2_fast_symlink_inode_operations;
> - memcpy((char*)(EXT2_I(inode)->i_data),symname,l);
> + inode->i_link = (char*)EXT2_I(inode)->i_data;
> + memcpy(inode->i_link, symname, l);
> inode->i_size = l-1;
> }
> mark_inode_dirty(inode);
> diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
> index 20608f1..ae17179 100644
> --- a/fs/ext2/symlink.c
> +++ b/fs/ext2/symlink.c
> @@ -19,14 +19,6 @@
>
> #include "ext2.h"
> #include "xattr.h"
> -#include <linux/namei.h>
> -
> -static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - struct ext2_inode_info *ei = EXT2_I(d_inode(dentry));
> - nd_set_link(nd, (char *)ei->i_data);
> - return NULL;
> -}
>
> const struct inode_operations ext2_symlink_inode_operations = {
> .readlink = generic_readlink,
> @@ -43,7 +35,7 @@ const struct inode_operations ext2_symlink_inode_operations = {
>
> const struct inode_operations ext2_fast_symlink_inode_operations = {
> .readlink = generic_readlink,
> - .follow_link = ext2_follow_link,
> + .follow_link = simple_follow_link,
> .setattr = ext2_setattr,
> #ifdef CONFIG_EXT2_FS_XATTR
> .setxattr = generic_setxattr,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:19:57

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 08/79] ext3: switch to simple_follow_link()

On Tue 05-05-15 06:21:42, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext3/inode.c | 1 +
> fs/ext3/namei.c | 3 ++-
> fs/ext3/symlink.c | 10 +---------
> 3 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> index 2ee2dc4..6c7e546 100644
> --- a/fs/ext3/inode.c
> +++ b/fs/ext3/inode.c
> @@ -2999,6 +2999,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
> inode->i_op = &ext3_fast_symlink_inode_operations;
> nd_terminate_link(ei->i_data, inode->i_size,
> sizeof(ei->i_data) - 1);
> + inode->i_link = (char *)ei->i_data;
> } else {
> inode->i_op = &ext3_symlink_inode_operations;
> ext3_set_aops(inode);
> diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
> index 4264b9b..c9e767c 100644
> --- a/fs/ext3/namei.c
> +++ b/fs/ext3/namei.c
> @@ -2308,7 +2308,8 @@ retry:
> }
> } else {
> inode->i_op = &ext3_fast_symlink_inode_operations;
> - memcpy((char*)&EXT3_I(inode)->i_data,symname,l);
> + inode->i_link = (char*)&EXT3_I(inode)->i_data;
> + memcpy(inode->i_link, symname, l);
> inode->i_size = l-1;
> }
> EXT3_I(inode)->i_disksize = inode->i_size;
> diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
> index ea96df3..c08c590 100644
> --- a/fs/ext3/symlink.c
> +++ b/fs/ext3/symlink.c
> @@ -17,17 +17,9 @@
> * ext3 symlink handling code
> */
>
> -#include <linux/namei.h>
> #include "ext3.h"
> #include "xattr.h"
>
> -static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - struct ext3_inode_info *ei = EXT3_I(d_inode(dentry));
> - nd_set_link(nd, (char*)ei->i_data);
> - return NULL;
> -}
> -
> const struct inode_operations ext3_symlink_inode_operations = {
> .readlink = generic_readlink,
> .follow_link = page_follow_link_light,
> @@ -43,7 +35,7 @@ const struct inode_operations ext3_symlink_inode_operations = {
>
> const struct inode_operations ext3_fast_symlink_inode_operations = {
> .readlink = generic_readlink,
> - .follow_link = ext3_follow_link,
> + .follow_link = simple_follow_link,
> .setattr = ext3_setattr,
> #ifdef CONFIG_EXT3_FS_XATTR
> .setxattr = generic_setxattr,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:20:13

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 05/79] libfs: simple_follow_link()

On Tue 05-05-15 06:21:39, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> let "fast" symlinks store the pointer to the body into ->i_link and
> use simple_follow_link for ->follow_link()
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/inode.c | 1 +
> fs/libfs.c | 13 +++++++++++++
> include/linux/fs.h | 3 +++
> 3 files changed, 17 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index ea37cd1..952fb48 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -152,6 +152,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> inode->i_pipe = NULL;
> inode->i_bdev = NULL;
> inode->i_cdev = NULL;
> + inode->i_link = NULL;
> inode->i_rdev = 0;
> inode->dirtied_when = 0;
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index cb1fb4b..72e4e01 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1093,3 +1093,16 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
> return -EINVAL;
> }
> EXPORT_SYMBOL(simple_nosetlease);
> +
> +void *simple_follow_link(struct dentry *dentry, struct nameidata *nd)
> +{
> + nd_set_link(nd, d_inode(dentry)->i_link);
> + return NULL;
> +}
> +EXPORT_SYMBOL(simple_follow_link);
> +
> +const struct inode_operations simple_symlink_inode_operations = {
> + .follow_link = simple_follow_link,
> + .readlink = generic_readlink
> +};
> +EXPORT_SYMBOL(simple_symlink_inode_operations);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 35ec87e..0ac758f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -656,6 +656,7 @@ struct inode {
> struct pipe_inode_info *i_pipe;
> struct block_device *i_bdev;
> struct cdev *i_cdev;
> + char *i_link;
> };
>
> __u32 i_generation;
> @@ -2721,6 +2722,8 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
> void inode_sub_bytes(struct inode *inode, loff_t bytes);
> loff_t inode_get_bytes(struct inode *inode);
> void inode_set_bytes(struct inode *inode, loff_t bytes);
> +void *simple_follow_link(struct dentry *, struct nameidata *);
> +extern const struct inode_operations simple_symlink_inode_operations;
>
> extern int iterate_dir(struct file *, struct dir_context *);
>
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:21:28

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 07/79] befs: switch to simple_follow_link()

On Tue 05-05-15 06:21:41, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/befs/linuxvfs.c | 24 +++++-------------------
> 1 file changed, 5 insertions(+), 19 deletions(-)
>
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index 7943533..172e306 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -43,7 +43,6 @@ static struct inode *befs_alloc_inode(struct super_block *sb);
> static void befs_destroy_inode(struct inode *inode);
> static void befs_destroy_inodecache(void);
> static void *befs_follow_link(struct dentry *, struct nameidata *);
> -static void *befs_fast_follow_link(struct dentry *, struct nameidata *);
> static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
> char **out, int *out_len);
> static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
> @@ -80,11 +79,6 @@ static const struct address_space_operations befs_aops = {
> .bmap = befs_bmap,
> };
>
> -static const struct inode_operations befs_fast_symlink_inode_operations = {
> - .readlink = generic_readlink,
> - .follow_link = befs_fast_follow_link,
> -};
> -
> static const struct inode_operations befs_symlink_inode_operations = {
> .readlink = generic_readlink,
> .follow_link = befs_follow_link,
> @@ -403,10 +397,12 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
> inode->i_op = &befs_dir_inode_operations;
> inode->i_fop = &befs_dir_operations;
> } else if (S_ISLNK(inode->i_mode)) {
> - if (befs_ino->i_flags & BEFS_LONG_SYMLINK)
> + if (befs_ino->i_flags & BEFS_LONG_SYMLINK) {
> inode->i_op = &befs_symlink_inode_operations;
> - else
> - inode->i_op = &befs_fast_symlink_inode_operations;
> + } else {
> + inode->i_link = befs_ino->i_data.symlink;
> + inode->i_op = &simple_symlink_inode_operations;
> + }
> } else {
> befs_error(sb, "Inode %lu is not a regular file, "
> "directory or symlink. THAT IS WRONG! BeFS has no "
> @@ -497,16 +493,6 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
> return NULL;
> }
>
> -
> -static void *
> -befs_fast_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
> -
> - nd_set_link(nd, befs_ino->i_data.symlink);
> - return NULL;
> -}
> -
> /*
> * UTF-8 to NLS charset convert routine
> *
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:23:07

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 09/79] ext4: switch to simple_follow_link()

On Tue 05-05-15 06:21:43, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> for fast symlinks only, of course...
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 1 +
> fs/ext4/namei.c | 4 +++-
> fs/ext4/symlink.c | 9 +--------
> 3 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index a320558..aba562b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4219,6 +4219,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> BUILD_BUG();
> #endif
> } else if (ext4_inode_is_fast_symlink(inode)) {
> + inode->i_link = (char *)ei->i_data;
> inode->i_op = &ext4_fast_symlink_inode_operations;
> nd_terminate_link(ei->i_data, inode->i_size,
> sizeof(ei->i_data) - 1);
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 84b1920..99017eb 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3311,8 +3311,10 @@ static int ext4_symlink(struct inode *dir,
> } else {
> /* clear the extent format for fast symlink */
> ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
> - if (!encryption_required)
> + if (!encryption_required) {
> inode->i_op = &ext4_fast_symlink_inode_operations;
> + inode->i_link = (char *)&EXT4_I(inode)->i_data;
> + }
> memcpy((char *)&EXT4_I(inode)->i_data, disk_link.name,
> disk_link.len);
> inode->i_size = disk_link.len - 1;
> diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
> index 4ea5a01..1720a13 100644
> --- a/fs/ext4/symlink.c
> +++ b/fs/ext4/symlink.c
> @@ -106,13 +106,6 @@ const struct inode_operations ext4_encrypted_symlink_inode_operations = {
> };
> #endif
>
> -static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - struct ext4_inode_info *ei = EXT4_I(d_inode(dentry));
> - nd_set_link(nd, (char *) ei->i_data);
> - return NULL;
> -}
> -
> const struct inode_operations ext4_symlink_inode_operations = {
> .readlink = generic_readlink,
> .follow_link = page_follow_link_light,
> @@ -126,7 +119,7 @@ const struct inode_operations ext4_symlink_inode_operations = {
>
> const struct inode_operations ext4_fast_symlink_inode_operations = {
> .readlink = generic_readlink,
> - .follow_link = ext4_follow_fast_link,
> + .follow_link = simple_follow_link,
> .setattr = ext4_setattr,
> .setxattr = generic_setxattr,
> .getxattr = generic_getxattr,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:25:10

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 11/79] shmem: switch to simple_follow_link()

On Tue 05-05-15 06:21:45, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> mm/shmem.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index de98137..7f6e2f8 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2451,6 +2451,7 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
> return -ENOMEM;
> }
> inode->i_op = &shmem_short_symlink_operations;
> + inode->i_link = info->symlink;
> } else {
> error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
> if (error) {
> @@ -2474,12 +2475,6 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
> return 0;
> }
>
> -static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata *nd)
> -{
> - nd_set_link(nd, SHMEM_I(d_inode(dentry))->symlink);
> - return NULL;
> -}
> -
> static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
> {
> struct page *page = NULL;
> @@ -2642,7 +2637,7 @@ static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>
> static const struct inode_operations shmem_short_symlink_operations = {
> .readlink = generic_readlink,
> - .follow_link = shmem_follow_short_symlink,
> + .follow_link = simple_follow_link,
> #ifdef CONFIG_TMPFS_XATTR
> .setxattr = shmem_setxattr,
> .getxattr = shmem_getxattr,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:26:02

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 12/79] debugfs: switch to simple_follow_link()

On Tue 05-05-15 06:21:46, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/debugfs/file.c | 12 ------------
> fs/debugfs/inode.c | 6 +++---
> include/linux/debugfs.h | 1 -
> 3 files changed, 3 insertions(+), 16 deletions(-)
>
> diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
> index 830a7e7..284f9aa 100644
> --- a/fs/debugfs/file.c
> +++ b/fs/debugfs/file.c
> @@ -17,7 +17,6 @@
> #include <linux/fs.h>
> #include <linux/seq_file.h>
> #include <linux/pagemap.h>
> -#include <linux/namei.h>
> #include <linux/debugfs.h>
> #include <linux/io.h>
> #include <linux/slab.h>
> @@ -43,17 +42,6 @@ const struct file_operations debugfs_file_operations = {
> .llseek = noop_llseek,
> };
>
> -static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - nd_set_link(nd, d_inode(dentry)->i_private);
> - return NULL;
> -}
> -
> -const struct inode_operations debugfs_link_operations = {
> - .readlink = generic_readlink,
> - .follow_link = debugfs_follow_link,
> -};
> -
> static int debugfs_u8_set(void *data, u64 val)
> {
> *(u8 *)data = val;
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index c1e7ffb..7eaec88 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -174,7 +174,7 @@ static void debugfs_evict_inode(struct inode *inode)
> truncate_inode_pages_final(&inode->i_data);
> clear_inode(inode);
> if (S_ISLNK(inode->i_mode))
> - kfree(inode->i_private);
> + kfree(inode->i_link);
> }
>
> static const struct super_operations debugfs_super_operations = {
> @@ -511,8 +511,8 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
> return failed_creating(dentry);
> }
> inode->i_mode = S_IFLNK | S_IRWXUGO;
> - inode->i_op = &debugfs_link_operations;
> - inode->i_private = link;
> + inode->i_op = &simple_symlink_inode_operations;
> + inode->i_link = link;
> d_instantiate(dentry, inode);
> return end_creating(dentry);
> }
> diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
> index cb25af4..420311b 100644
> --- a/include/linux/debugfs.h
> +++ b/include/linux/debugfs.h
> @@ -45,7 +45,6 @@ extern struct dentry *arch_debugfs_dir;
>
> /* declared over in file.c */
> extern const struct file_operations debugfs_file_operations;
> -extern const struct inode_operations debugfs_link_operations;
>
> struct dentry *debugfs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 09:28:00

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 16/79] jfs: switch to simple_follow_link()

On Tue 05-05-15 06:21:50, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/jfs/inode.c | 3 ++-
> fs/jfs/namei.c | 5 ++---
> fs/jfs/symlink.c | 10 +---------
> 3 files changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
> index 070dc4b..6f1cb2b 100644
> --- a/fs/jfs/inode.c
> +++ b/fs/jfs/inode.c
> @@ -63,11 +63,12 @@ struct inode *jfs_iget(struct super_block *sb, unsigned long ino)
> inode->i_mapping->a_ops = &jfs_aops;
> } else {
> inode->i_op = &jfs_fast_symlink_inode_operations;
> + inode->i_link = JFS_IP(inode)->i_inline;
> /*
> * The inline data should be null-terminated, but
> * don't let on-disk corruption crash the kernel
> */
> - JFS_IP(inode)->i_inline[inode->i_size] = '\0';
> + inode->i_link[inode->i_size] = '\0';
> }
> } else {
> inode->i_op = &jfs_file_inode_operations;
> diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
> index 66db7bc..e33be92 100644
> --- a/fs/jfs/namei.c
> +++ b/fs/jfs/namei.c
> @@ -880,7 +880,6 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
> int ssize; /* source pathname size */
> struct btstack btstack;
> struct inode *ip = d_inode(dentry);
> - unchar *i_fastsymlink;
> s64 xlen = 0;
> int bmask = 0, xsize;
> s64 xaddr;
> @@ -946,8 +945,8 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
> if (ssize <= IDATASIZE) {
> ip->i_op = &jfs_fast_symlink_inode_operations;
>
> - i_fastsymlink = JFS_IP(ip)->i_inline;
> - memcpy(i_fastsymlink, name, ssize);
> + ip->i_link = JFS_IP(ip)->i_inline;
> + memcpy(ip->i_link, name, ssize);
> ip->i_size = ssize - 1;
>
> /*
> diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
> index 80f42bc..5929e23 100644
> --- a/fs/jfs/symlink.c
> +++ b/fs/jfs/symlink.c
> @@ -17,21 +17,13 @@
> */
>
> #include <linux/fs.h>
> -#include <linux/namei.h>
> #include "jfs_incore.h"
> #include "jfs_inode.h"
> #include "jfs_xattr.h"
>
> -static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - char *s = JFS_IP(d_inode(dentry))->i_inline;
> - nd_set_link(nd, s);
> - return NULL;
> -}
> -
> const struct inode_operations jfs_fast_symlink_inode_operations = {
> .readlink = generic_readlink,
> - .follow_link = jfs_follow_link,
> + .follow_link = simple_follow_link,
> .setattr = jfs_setattr,
> .setxattr = jfs_setxattr,
> .getxattr = jfs_getxattr,
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-05 16:08:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 04/79] ext4: split inode_operations for encrypted symlinks off the rest

On Mon, May 4, 2015 at 10:21 PM, Al Viro <[email protected]> wrote:
> } else if (S_ISLNK(inode->i_mode)) {
> - if (ext4_inode_is_fast_symlink(inode) &&
> - !ext4_encrypted_inode(inode)) {
> + if (ext4_encrypted_inode(inode)) {
> +#ifdef CONFIG_EXT4_FS_ENCRYPTION
> + inode->i_op = &ext4_encrypted_symlink_inode_operations;
> + ext4_set_aops(inode);
> +#else
> + BUILD_BUG();
> +#endif
> + } else if (ext4_inode_is_fast_symlink(inode)) {
> inode->i_op = &ext4_fast_symlink_inode_operations;
> nd_terminate_link(ei->i_data, inode->i_size,
> sizeof(ei->i_data) - 1);

Ugh. Could we aim to *not* add code like this.

Instead, just declare (but don't define) the
ext4_encrypted_symlink_inode_operations thing, so that *if* somebody
uses it they get a link error, and make sure that
"ext4_encrypted_inode()" ends up always returning zero when encryption
isn't enabled, so that the compiler will actually optimize the whole
thing out (which apparently is already the case, judging by the
build-bug-on.

I really prefer to not see #ifdef's inside the middle of anything but
trivial helper functions. I know we have them, and I wish we didn't,
but at least we can aim to not add more of them.

Linus

2015-05-05 16:08:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 40/79] link_path_walk: turn inner loop into explicit goto

On Mon, May 4, 2015 at 10:22 PM, Al Viro <[email protected]> wrote:
> +l:

This looked like noise.

PLEASE. We're not programming in Pascal (and thank all Gods for that),
so we can have labels that have meaningful names. Also, we're not
ashamed of using goto where it makes sense, so we don't need to try to
hide the labels by making them look like specks of dirt on our
monitor.

So give the label a real name that describes what it is. Not this kind of thing.

Linus

2015-05-05 16:09:04

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 04/79] ext4: split inode_operations for encrypted symlinks off the rest

On Tue, May 05, 2015 at 08:01:52AM -0700, Linus Torvalds wrote:
> Ugh. Could we aim to *not* add code like this.
>
> Instead, just declare (but don't define) the
> ext4_encrypted_symlink_inode_operations thing, so that *if* somebody
> uses it they get a link error, and make sure that
> "ext4_encrypted_inode()" ends up always returning zero when encryption
> isn't enabled, so that the compiler will actually optimize the whole
> thing out (which apparently is already the case, judging by the
> build-bug-on.

Sure, no problem. BUILD_BUG_ON was used only because it triggered an
error earlier, I simply forgot to rip it out afterwards.

2015-05-05 16:09:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 42/79] link_path_walk: get rid of duplication

On Mon, May 4, 2015 at 10:22 PM, Al Viro <[email protected]> wrote:
> -l:

Ok, I guess this gets rid of the nasty badly-named label I complained about.

Linus

2015-05-05 16:09:49

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 40/79] link_path_walk: turn inner loop into explicit goto

On Tue, May 05, 2015 at 08:09:04AM -0700, Linus Torvalds wrote:
> On Mon, May 4, 2015 at 10:22 PM, Al Viro <[email protected]> wrote:
> > +l:
>
> This looked like noise.
>
> PLEASE. We're not programming in Pascal (and thank all Gods for that),
> so we can have labels that have meaningful names. Also, we're not
> ashamed of using goto where it makes sense, so we don't need to try to
> hide the labels by making them look like specks of dirt on our
> monitor.
>
> So give the label a real name that describes what it is. Not this kind of thing.

Something like this_used_to_be_the_beginning_of_loop_body?

Seriously, it's strictly temporary thing - it lives for two commits and
gets killed after that. Exactly because it has no good inherent meaning.
The label replacing it two commits later (several lines upstream) does,
and it gets better name...

2015-05-05 16:09:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET] non-recursive pathname resolution

On Mon, May 4, 2015 at 10:22 PM, Al Viro <[email protected]> wrote:
> My apologies for the patchbomb from hell, but I really think
> it needs a public review.

Aside from the small nits I noted (and I guess I can even live with
the bad label name as long as it never gets resurrected) it looks
good. But I'd obviously like it to get as much testing/beating on as
possible before merging: I'm assuming it will have gone at least
through xfstests etc..

Linus

2015-05-05 16:10:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 40/79] link_path_walk: turn inner loop into explicit goto

On Tue, May 5, 2015 at 8:17 AM, Al Viro <[email protected]> wrote:
>
> Something like this_used_to_be_the_beginning_of_loop_body?

No, but at least "loop:" or "repeat:" or something.

That "l:" really looked like a fly shat on my monitor.

Linus

2015-05-05 23:59:52

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET] non-recursive pathname resolution

On Tue, May 05, 2015 at 08:20:16AM -0700, Linus Torvalds wrote:
> On Mon, May 4, 2015 at 10:22 PM, Al Viro <[email protected]> wrote:
> > My apologies for the patchbomb from hell, but I really think
> > it needs a public review.
>
> Aside from the small nits I noted (and I guess I can even live with
> the bad label name as long as it never gets resurrected) it looks
> good. But I'd obviously like it to get as much testing/beating on as
> possible before merging: I'm assuming it will have gone at least
> through xfstests etc..

It is passing xfstests and LTP, plus some basic "create a twisted forest
of symlinks and walk it" tests, but yes, it obviously needs more beating.
I'll push everything up to #76 into -next tonight (with the changes you
asked for).

As for the next steps...
* we want _all_ failure exits from RCU mode to empty the stack
immediately - it will be too late to call ->put_link() once we dropped
rcu_read_lock().
* we want failure exits from complete_walk() to empty the stack,
for the sake of consistency (due to the above it will do so when called in
RCU mode).
* we want to postpone assignment to nd->seq until after we are
sure we haven't found a symlink to follow; I have that in local queue,
will post in the next batch for review.
* we want to have pick_link() store directly into nd->stack[...].link;
no point postponing that to get_link(). nd->link can die at that point.
Done in local queue.
* we want to store inode and seq along with path in nd->stack[...].
It means extra 4 words in struct nameidata (EMBEDDED_LEVELS * 2), i.e. not
too horrible wrt stack footprint. should_follow_link()/pick_link() get
those as additional arguments. Done in local queue.
* we need a variant of legitimize_mnt() that would _not_ leave RCU
mode. What I have in mind is
legitimize_mnt(m)
{
switch (__legitimize_mnt(m, seq)) {
case 0:
return true;
default:
rcu_read_unlock();
mntput(m);
rcu_read_lock();
case 1:
return false;
}
}
with the ability to call __legitimize_mnt(), remember the return value and,
in case of failure, do all pending ->put_link() before dropping rcu_read_lock()
We'll need that for unlazy_walk() et.al. - do this __legitimize_mnt()
on path->nd.mnt and all nd->stack[0..nd->depth - 1].mnt, and if any of them
fail, stop right there, do ->put_link() on everything, drop rcu_read_lock(),
mntput() everything on stack we'd already legitimized, reset nd->depth to 0 and bugger off the way we currently would. If all of __legitimize_mnt() succeed,
we go and acquire dentry references, checking them against nd->seq for
nd->path and nd->stack[i].seq for nd->stack[i].link. Any failure =>
->put_link() for everything on stack, rcu_read_unlock(), dput() everything
we'd managed to grab in the stack, mntput() everything in the stack,
reset nd->depth to 0 and bugger off the way we currently would.
complete_walk() is similar - it starts with could be
if (nd->flags & LOOKUP_RCU) {
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
if (unlikely(unlazy_walk(nd, NULL))
return -ECHILD;
}
and perhaps it would make sense to try just that and see if unlazy_walk()
goes visibly higher in profiles... Note that it *can* be called with
non-empty stack and be required to legitimize its (only) element - e.g.
creat() on a dangling symlink will step into that. So we can't just
start with unconditionally emptying the stack.
terminate_walk() is simpler - if we are in RCU mode, just call
->put_link() on everything in stack and reset nd->depth; if we are not,
leave as in this patchset.
* ->put_link() should get inode + cookie instead of dentry + cookie;
sure, right now the only instance that wants more than cookie is in hppfs
and that's scheduled for removal, but we need the inode to find the damn
method anyway, so passing it is no extra burden. Should be RCU-safe, which
is actually true for all instances.
* ->follow_link() should take dentry + inode, with NULL/inode meaning
"we are in RCU mode, return -ECHILD instead of doing anything you can't do
in RCU pathwalk". get_link() should be basically
touch atime (might throw out of RCU mode)
do security_inode_follow_link(), fuck off if it fails
nd->last_type = LAST_BIND;
if (inode->i_link)
return it, we are done
if (in RCU mode) {
link = ...->follow_link(NULL, inode);
if (link == ERR_PTR(-ECHILD))
try to unlazy, return link if that fails
else
return link;
}
link = ...->follow_link(dentry, inode);
then as in this patchset.
* nd_alloc_stack() should do GFP_ATOMIC allocation in RCU mode and
treat a failure as "try to unlazy, fail with ECHILD if can't do that, proceed
to GFP_KERNEL allocation otherwise".
* may_follow_link() should, in case of failure, check if it's in
RCU mode and do terminate_walk()/fail with ECHILD instead of going into
audit crap. Sure, it'll cost us a restart that will almost certainly end
with hard error on the same symlink, but audit_log_link_denied() isn't
usable in RCU mode. We probably could just make it use GFP_ATOMIC
for allocations, but that's a separate story...
* we need set_root_rcu() to store ->d_seq of root in nameidata,
so that traversing an absolute symlink in RCU mode would be able to start
with correct nd->seq - fetch nd->root.dentry->d_inode, check
nd->root.dentry->d_seq against nd->root_seq, fail with -ECHILD on mismatch
and use inode and root_seq for nd->inode / nd->seq otherwise.
* put_link() should, as in Neil's patchset, have path_put() conditional
on non-RCU mode.
* unlazy_walk() is really unpleasant. In addition to the issues
around legitimizing many vfsmounts without being allowed to drop rcu_read_lock
(discussed above), we have three use cases:
1) unlazy_walk(nd, NULL). legitimize nd->path and everything already
in nd->stack, from 0 to nd->depth - 1. On failure do ->put_link() for
everything on stack *before* dropping rcu_read_lock().
2) unlazy_walk() in lookup_fast(). We want to give it correct seq number
(and I already have that done in local tree) and legitimize that dentry +
what case 1 would do.
3) unlazy between deciding it's a symlink to follow and succeeding with
->follow_link() in get_link(). It's this case that really sucks - we
want to legitimize everything we would in case 1 + path/dentry of our link.
And it _can_ have a different vfsmount - we can't just rely on getting
nd->path.mnt being equal to it. It's actually more like case 1 than case 2 -
"legitimize nd->path and everything on stack from 0 to nd->depth, without
->put_link() in case of failure done only from 0 to nd->depth - 1 (as in
case 1)". We probably should just do this:
m = mnt of pending link
res = __legitimize_mnt(m);
if (likely(!res)) {
d = dentry of pending link
try to legitimize d
if failed
res = 2;
}
if (unlikely(res)) {
terminate_walk(nd);
if (res != 1)
mntput(m);
return -ECHILD;
}
res = unlazy_walk(nd, NULL);
if (unlikely(res)) {
dput(d);
mntput(m);
}
return res;

AFAICS, the above should suffice, modulo teaching ->follow_link() instances
to fail with ECHILD when called in RCU mode and then teaching some of them
_not_ to. Note that fast symlinks would be fine without the last part and
that covers most of the actual use...

2015-05-06 02:43:06

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 47/79] namei: move link/cookie pairs into nameidata

On Tue, 5 May 2015 06:22:21 +0100 Al Viro <[email protected]> wrote:

> From: Al Viro <[email protected]>
>
> Array of MAX_NESTED_LINKS + 1 elements put into nameidata;
> what used to be a local array in link_path_walk() occupies
> entries 1 .. MAX_NESTED_LINKS in it, link and cookie from
> the trailing symlink handling loops - entry 0.
>
> This is _not_ the final arrangement; just an easily verified
> incremental step.
>
> Signed-off-by: Al Viro <[email protected]>
> ---
> fs/namei.c | 35 ++++++++++++++++++-----------------
> 1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 38ff968..e6d54ec 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -505,6 +505,11 @@ struct nameidata {
> int last_type;
> unsigned depth;
> struct file *base;
> + struct saved {
> + struct path link;
> + void *cookie;
> + const char *name;
> + } stack[MAX_NESTED_LINKS + 1];
> };
>
> /*
> @@ -1703,11 +1708,7 @@ static inline u64 hash_name(const char *name)
> */
> static int link_path_walk(const char *name, struct nameidata *nd)
> {
> - struct saved {
> - struct path link;
> - void *cookie;
> - const char *name;
> - } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
> + struct saved *last = nd->stack;

This looks wrong. 'last' used to be dependent on nd->depth, now it isn't.

Three patches later we have:

+const char *get_link(struct nameidata *nd)
{
- struct dentry *dentry = link->dentry;
+ struct saved *last = nd->stack + nd->depth;
+ struct dentry *dentry = nd->link.dentry;
struct inode *inode = dentry->d_inode;

and

static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct saved *last = nd->stack;
int err;

....
- last->link = nd->link;
- s = get_link(&last->link, nd, &last->cookie);
+ s = get_link(nd);


so the nd->depth offset is restored in get_link() so the next result appears
correct, but the intermediate states appear to be wrong.

NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-06 02:55:42

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 51/79] namei: remove restrictions on nesting depth

On Tue, 5 May 2015 06:22:25 +0100 Al Viro <[email protected]> wrote:

> From: Al Viro <[email protected]>
>
> The only restriction is that on the total amount of symlinks
> crossed; how they are nested does not matter
>
> Signed-off-by: Al Viro <[email protected]>
> ---


> - if (unlikely(current->total_link_count >= 40)) {
> + if (unlikely(current->total_link_count >= MAXSYMLINKS)) {

There is still a literal '40' in follow_automount.

current->total_link_count++;
if (current->total_link_count >= 40)
return -ELOOP;


should that become MAXSYMLINKS too?

Thanks,
NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-06 07:22:28

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 47/79] namei: move link/cookie pairs into nameidata

On Wed, May 06, 2015 at 12:42:54PM +1000, NeilBrown wrote:
> > static int link_path_walk(const char *name, struct nameidata *nd)
> > {
> > - struct saved {
> > - struct path link;
> > - void *cookie;
> > - const char *name;
> > - } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
> > + struct saved *last = nd->stack;
>
> This looks wrong. 'last' used to be dependent on nd->depth, now it isn't.

On the entry to link_path_walk() we have nd->depth == 0.

And nd->depth and last are changed in sync.

2015-05-06 07:24:12

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 51/79] namei: remove restrictions on nesting depth

On Wed, May 06, 2015 at 12:55:28PM +1000, NeilBrown wrote:

> > - if (unlikely(current->total_link_count >= 40)) {
> > + if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
>
> There is still a literal '40' in follow_automount.
>
> current->total_link_count++;
> if (current->total_link_count >= 40)
> return -ELOOP;
>
>
> should that become MAXSYMLINKS too?

Yes, assuming we want to keep it at all...

2015-05-06 09:59:18

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 18/79] exofs: switch to {simple,page}_symlink_inode_operations

On 05/05/2015 08:21 AM, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>

ACK-by: Boaz Harrosh <[email protected]>

Thanks Al so much nicer (And safer)
Boaz

> ---
> fs/exofs/Kbuild | 2 +-
> fs/exofs/exofs.h | 4 ----
> fs/exofs/inode.c | 9 +++++----
> fs/exofs/namei.c | 5 +++--
> fs/exofs/symlink.c | 55 ------------------------------------------------------
> 5 files changed, 9 insertions(+), 66 deletions(-)
> delete mode 100644 fs/exofs/symlink.c
>
> diff --git a/fs/exofs/Kbuild b/fs/exofs/Kbuild
> index b47c7b8..a364fd0 100644
> --- a/fs/exofs/Kbuild
> +++ b/fs/exofs/Kbuild
> @@ -16,5 +16,5 @@
> libore-y := ore.o ore_raid.o
> obj-$(CONFIG_ORE) += libore.o
>
> -exofs-y := inode.o file.o symlink.o namei.o dir.o super.o sys.o
> +exofs-y := inode.o file.o namei.o dir.o super.o sys.o
> obj-$(CONFIG_EXOFS_FS) += exofs.o
> diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
> index ad9cac6..2e86086 100644
> --- a/fs/exofs/exofs.h
> +++ b/fs/exofs/exofs.h
> @@ -207,10 +207,6 @@ extern const struct address_space_operations exofs_aops;
> extern const struct inode_operations exofs_dir_inode_operations;
> extern const struct inode_operations exofs_special_inode_operations;
>
> -/* symlink.c */
> -extern const struct inode_operations exofs_symlink_inode_operations;
> -extern const struct inode_operations exofs_fast_symlink_inode_operations;
> -
> /* exofs_init_comps will initialize an ore_components device array
> * pointing to a single ore_comp struct, and a round-robin view
> * of the device table.
> diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
> index 786e4cc..73c64da 100644
> --- a/fs/exofs/inode.c
> +++ b/fs/exofs/inode.c
> @@ -1222,10 +1222,11 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
> inode->i_fop = &exofs_dir_operations;
> inode->i_mapping->a_ops = &exofs_aops;
> } else if (S_ISLNK(inode->i_mode)) {
> - if (exofs_inode_is_fast_symlink(inode))
> - inode->i_op = &exofs_fast_symlink_inode_operations;
> - else {
> - inode->i_op = &exofs_symlink_inode_operations;
> + if (exofs_inode_is_fast_symlink(inode)) {
> + inode->i_op = &simple_symlink_inode_operations;
> + inode->i_link = (char *)oi->i_data;
> + } else {
> + inode->i_op = &page_symlink_inode_operations;
> inode->i_mapping->a_ops = &exofs_aops;
> }
> } else {
> diff --git a/fs/exofs/namei.c b/fs/exofs/namei.c
> index 5ae25e4..09a6bb1 100644
> --- a/fs/exofs/namei.c
> +++ b/fs/exofs/namei.c
> @@ -113,7 +113,7 @@ static int exofs_symlink(struct inode *dir, struct dentry *dentry,
> oi = exofs_i(inode);
> if (l > sizeof(oi->i_data)) {
> /* slow symlink */
> - inode->i_op = &exofs_symlink_inode_operations;
> + inode->i_op = &page_symlink_inode_operations;
> inode->i_mapping->a_ops = &exofs_aops;
> memset(oi->i_data, 0, sizeof(oi->i_data));
>
> @@ -122,7 +122,8 @@ static int exofs_symlink(struct inode *dir, struct dentry *dentry,
> goto out_fail;
> } else {
> /* fast symlink */
> - inode->i_op = &exofs_fast_symlink_inode_operations;
> + inode->i_op = &simple_symlink_inode_operations;
> + inode->i_link = (char *)oi->i_data;
> memcpy(oi->i_data, symname, l);
> inode->i_size = l-1;
> }
> diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
> deleted file mode 100644
> index 6f6f3a4..0000000
> --- a/fs/exofs/symlink.c
> +++ /dev/null
> @@ -1,55 +0,0 @@
> -/*
> - * Copyright (C) 2005, 2006
> - * Avishay Traeger ([email protected])
> - * Copyright (C) 2008, 2009
> - * Boaz Harrosh <[email protected]>
> - *
> - * Copyrights for code taken from ext2:
> - * Copyright (C) 1992, 1993, 1994, 1995
> - * Remy Card ([email protected])
> - * Laboratoire MASI - Institut Blaise Pascal
> - * Universite Pierre et Marie Curie (Paris VI)
> - * from
> - * linux/fs/minix/inode.c
> - * Copyright (C) 1991, 1992 Linus Torvalds
> - *
> - * This file is part of exofs.
> - *
> - * exofs is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation. Since it is based on ext2, and the only
> - * valid version of GPL for the Linux kernel is version 2, the only valid
> - * version of GPL for exofs is version 2.
> - *
> - * exofs is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with exofs; if not, write to the Free Software
> - * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> - */
> -
> -#include <linux/namei.h>
> -
> -#include "exofs.h"
> -
> -static void *exofs_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - struct exofs_i_info *oi = exofs_i(d_inode(dentry));
> -
> - nd_set_link(nd, (char *)oi->i_data);
> - return NULL;
> -}
> -
> -const struct inode_operations exofs_symlink_inode_operations = {
> - .readlink = generic_readlink,
> - .follow_link = page_follow_link_light,
> - .put_link = page_put_link,
> -};
> -
> -const struct inode_operations exofs_fast_symlink_inode_operations = {
> - .readlink = generic_readlink,
> - .follow_link = exofs_follow_link,
> -};
>

2015-05-06 13:41:06

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH 16/79] jfs: switch to simple_follow_link()

On 05/05/2015 12:21 AM, Al Viro wrote:
> From: Al Viro <[email protected]>
>
> Signed-off-by: Al Viro <[email protected]>
Acked-by: Dave Kleikamp <[email protected]>

> ---
> fs/jfs/inode.c | 3 ++-
> fs/jfs/namei.c | 5 ++---
> fs/jfs/symlink.c | 10 +---------
> 3 files changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
> index 070dc4b..6f1cb2b 100644
> --- a/fs/jfs/inode.c
> +++ b/fs/jfs/inode.c
> @@ -63,11 +63,12 @@ struct inode *jfs_iget(struct super_block *sb, unsigned long ino)
> inode->i_mapping->a_ops = &jfs_aops;
> } else {
> inode->i_op = &jfs_fast_symlink_inode_operations;
> + inode->i_link = JFS_IP(inode)->i_inline;
> /*
> * The inline data should be null-terminated, but
> * don't let on-disk corruption crash the kernel
> */
> - JFS_IP(inode)->i_inline[inode->i_size] = '\0';
> + inode->i_link[inode->i_size] = '\0';
> }
> } else {
> inode->i_op = &jfs_file_inode_operations;
> diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
> index 66db7bc..e33be92 100644
> --- a/fs/jfs/namei.c
> +++ b/fs/jfs/namei.c
> @@ -880,7 +880,6 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
> int ssize; /* source pathname size */
> struct btstack btstack;
> struct inode *ip = d_inode(dentry);
> - unchar *i_fastsymlink;
> s64 xlen = 0;
> int bmask = 0, xsize;
> s64 xaddr;
> @@ -946,8 +945,8 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
> if (ssize <= IDATASIZE) {
> ip->i_op = &jfs_fast_symlink_inode_operations;
>
> - i_fastsymlink = JFS_IP(ip)->i_inline;
> - memcpy(i_fastsymlink, name, ssize);
> + ip->i_link = JFS_IP(ip)->i_inline;
> + memcpy(ip->i_link, name, ssize);
> ip->i_size = ssize - 1;
>
> /*
> diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
> index 80f42bc..5929e23 100644
> --- a/fs/jfs/symlink.c
> +++ b/fs/jfs/symlink.c
> @@ -17,21 +17,13 @@
> */
>
> #include <linux/fs.h>
> -#include <linux/namei.h>
> #include "jfs_incore.h"
> #include "jfs_inode.h"
> #include "jfs_xattr.h"
>
> -static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
> -{
> - char *s = JFS_IP(d_inode(dentry))->i_inline;
> - nd_set_link(nd, s);
> - return NULL;
> -}
> -
> const struct inode_operations jfs_fast_symlink_inode_operations = {
> .readlink = generic_readlink,
> - .follow_link = jfs_follow_link,
> + .follow_link = simple_follow_link,
> .setattr = jfs_setattr,
> .setxattr = jfs_setxattr,
> .getxattr = jfs_getxattr,
>

2015-05-06 16:22:36

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 51/79] namei: remove restrictions on nesting depth

On Wed, May 06, 2015 at 08:24:07AM +0100, Al Viro wrote:
> On Wed, May 06, 2015 at 12:55:28PM +1000, NeilBrown wrote:
>
> > > - if (unlikely(current->total_link_count >= 40)) {
> > > + if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
> >
> > There is still a literal '40' in follow_automount.
> >
> > current->total_link_count++;
> > if (current->total_link_count >= 40)
> > return -ELOOP;
> >
> >
> > should that become MAXSYMLINKS too?
>
> Yes, assuming we want to keep it at all...

To elaborate a bit - the reason why we count those among the symlink crossings
is historical; we used to implement automounting via ->follow_link().

Note that it's not really consistent - once a process has triggered automount,
subsequent lookups crossing that point *won't* have it counted as link
traversal. Except for ones that come while automount attempt is in progress,
etc.

Limit on link traversals makes obvious sense wrt avoiding loops and DoS -
reaching a symlink in effect is adding a pile of path components to the
path yet to be traversed. Automount points do nothing of that sort -
the pending path remains as-is.

So I'm not sure it makes sense to have those contribute to the same counter.
OTOH, there _is_ an unpleasant problem around follow_automount() - it's
a place where we call a method with heavy stack footprint still fairly
deep in stack; in mainline it can get as bad as ~2.8K deep, with this tree
the worst call chain entirely inside fs/namei.c is
SyS_renameat2 -> user_path_parent -> filename_lookup -> path_lookupat ->
path_init -> link_path_walk -> walk_component -> lookup_fast ->
follow_managed -> ->d_automount
and it costs 1.1K; slightly more shallow chains lead from linkat(2) and
do_filp_open()/do_file_open_root(). Again, mainline is more than two times
worse on those.

->d_manage() gets hit on the same depth, ->d_revalidate() at a bit less than
that.

FWIW, the sums on a fairly sane amd64 config are
[->d_manage] 1104
[->d_automount] 1104
[->d_revalidate] 1024
[->lookup] 992
[->put_link] 896
[->permission] 832
[->follow_link] 768
[->d_hash] 768
[->d_weak_revalidate] 624
[->create] 544
[->tmpfile] 480
[->atomic_open] 480
[->rename] 336
[->rename2] 336
[->link] 240
[->unlink] 224
[->rmdir] 176
[->mknod] 160
[->symlink] 144
[->mkdir] 144

And ->d_automount() is easily the heaviest of that bunch in terms of stack
footprint. So much that I really wonder if we ought to use something like
schedule_work() and (interruptibly) wait for completion; the interesting
question here is how much of the initiator's state is needed by worker.
The real namespace isn't - actual mounting is in finish_automount().
Credentials probably are; what about netns? Suppose a process steps on
NFS4 referral point and triggers a new NFS mount; which netns should be
used - that of the triggering process or that of the NFS mount where we'd
run into the referral? Ditto for AFS and CIFS...

BTW, if you want to torture the whole thing wrt stack footprint, try to make
/lib/firmware a symlink that would lead you through a referral point (on
the mainline - do so at the maximal nesting depth). The thing is,
request_firmware() will chase that one *on* *caller's* *stack*. Seeing that
we do have an async variant anyway (request_firmware_nowait()), could we
have request_firmware() itself go through schedule_work() (and wait for
completion, of course)?

2015-05-10 05:45:20

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET] non-recursive pathname resolution

On Wed, May 06, 2015 at 12:59:47AM +0100, Al Viro wrote:

> It is passing xfstests and LTP, plus some basic "create a twisted forest
> of symlinks and walk it" tests, but yes, it obviously needs more beating.
> I'll push everything up to #76 into -next tonight (with the changes you
> asked for).

[snip]

> AFAICS, the above should suffice, modulo teaching ->follow_link() instances
> to fail with ECHILD when called in RCU mode and then teaching some of them
> _not_ to. Note that fast symlinks would be fine without the last part and
> that covers most of the actual use...

FWIW, right now vfs.git#link_path_walk seems to be working and doing just
that - there's still some reordering to do in the new parts of queue, and
right now it's dropping out of RCU mode as soon as it runs into a non-trivial
symlink (== not a fast one), but the common case is handled without dropping
out of RCU mode until the very end. I'll do reordering tomorrow (along with
adding the missing documentation, etc.) and post the result for review, then
into -next it goes... Right now it's 107 commits on top of what already
went into mainline (1 from dhowells, 6 from neilb, the rest from me), so
it's going to be another patchbomb from hell ;-/

Missing bits: teaching ->follow_link() to try and remain in RCU mode (IMO
it should be getting (dentry, inode, &cookie) and have RCU mode indicated by
dentry == NULL), figuring out what to do with automounts (I really think
that ->d_automount() is worth offloading via schedule_work() or something
similar) and more testing/profiling/tuning the inlining/etc.

2015-05-11 18:08:17

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 001/110] 9p: don't bother with 4K allocation for 24-byte local array...

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/9p/vfs_inode.c | 26 +++++---------------------
1 file changed, 5 insertions(+), 21 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 703342e..cda68f7 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1370,6 +1370,8 @@ v9fs_vfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
return v9fs_vfs_mkspecial(dir, dentry, P9_DMSYMLINK, symname);
}

+#define U32_MAX_DIGITS 10
+
/**
* v9fs_vfs_link - create a hardlink
* @old_dentry: dentry for file to link to
@@ -1383,7 +1385,7 @@ v9fs_vfs_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
int retval;
- char *name;
+ char name[1 + U32_MAX_DIGITS + 2]; /* sign + number + \n + \0 */
struct p9_fid *oldfid;

p9_debug(P9_DEBUG_VFS, " %lu,%pd,%pd\n",
@@ -1393,20 +1395,12 @@ v9fs_vfs_link(struct dentry *old_dentry, struct inode *dir,
if (IS_ERR(oldfid))
return PTR_ERR(oldfid);

- name = __getname();
- if (unlikely(!name)) {
- retval = -ENOMEM;
- goto clunk_fid;
- }
-
sprintf(name, "%d\n", oldfid->fid);
retval = v9fs_vfs_mkspecial(dir, dentry, P9_DMLINK, name);
- __putname(name);
if (!retval) {
v9fs_refresh_inode(oldfid, d_inode(old_dentry));
v9fs_invalidate_inode_attr(dir);
}
-clunk_fid:
p9_client_clunk(oldfid);
return retval;
}
@@ -1425,7 +1419,7 @@ v9fs_vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rde
{
struct v9fs_session_info *v9ses = v9fs_inode2v9ses(dir);
int retval;
- char *name;
+ char name[2 + U32_MAX_DIGITS + 1 + U32_MAX_DIGITS + 1];
u32 perm;

p9_debug(P9_DEBUG_VFS, " %lu,%pd mode: %hx MAJOR: %u MINOR: %u\n",
@@ -1435,26 +1429,16 @@ v9fs_vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rde
if (!new_valid_dev(rdev))
return -EINVAL;

- name = __getname();
- if (!name)
- return -ENOMEM;
/* build extension */
if (S_ISBLK(mode))
sprintf(name, "b %u %u", MAJOR(rdev), MINOR(rdev));
else if (S_ISCHR(mode))
sprintf(name, "c %u %u", MAJOR(rdev), MINOR(rdev));
- else if (S_ISFIFO(mode))
- *name = 0;
- else if (S_ISSOCK(mode))
+ else
*name = 0;
- else {
- __putname(name);
- return -EINVAL;
- }

perm = unixmode2p9mode(v9ses, mode);
retval = v9fs_vfs_mkspecial(dir, dentry, perm, name);
- __putname(name);

return retval;
}
--
2.1.4

2015-05-11 18:44:03

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 002/110] 9p: don't bother with __getname() in ->follow_link()

From: Al Viro <[email protected]>

We copy there a kmalloc'ed string and proceed to kfree that string immediately
after that. Easier to just feed that string to nd_set_link() and _not_
kfree it until ->put_link() (which becomes kfree_put_link() in that case).

Signed-off-by: Al Viro <[email protected]>
---
fs/9p/v9fs.h | 2 --
fs/9p/vfs_inode.c | 93 ++++++++++----------------------------------------
fs/9p/vfs_inode_dotl.c | 31 +++++------------
3 files changed, 26 insertions(+), 100 deletions(-)

diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index fb9ffcb..0923f2c 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -149,8 +149,6 @@ extern int v9fs_vfs_unlink(struct inode *i, struct dentry *d);
extern int v9fs_vfs_rmdir(struct inode *i, struct dentry *d);
extern int v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
-extern void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd,
- void *p);
extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
struct p9_fid *fid,
struct super_block *sb, int new);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index cda68f7..0ba1171 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1224,103 +1224,46 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
}

/**
- * v9fs_readlink - read a symlink's location (internal version)
+ * v9fs_vfs_follow_link - follow a symlink path
* @dentry: dentry for symlink
- * @buffer: buffer to load symlink location into
- * @buflen: length of buffer
+ * @nd: nameidata
*
*/

-static int v9fs_readlink(struct dentry *dentry, char *buffer, int buflen)
+static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
{
- int retval;
-
- struct v9fs_session_info *v9ses;
- struct p9_fid *fid;
+ struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+ struct p9_fid *fid = v9fs_fid_lookup(dentry);
struct p9_wstat *st;

- p9_debug(P9_DEBUG_VFS, " %pd\n", dentry);
- retval = -EPERM;
- v9ses = v9fs_dentry2v9ses(dentry);
- fid = v9fs_fid_lookup(dentry);
+ p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
+
if (IS_ERR(fid))
- return PTR_ERR(fid);
+ return ERR_CAST(fid);

if (!v9fs_proto_dotu(v9ses))
- return -EBADF;
+ return ERR_PTR(-EBADF);

st = p9_client_stat(fid);
if (IS_ERR(st))
- return PTR_ERR(st);
+ return ERR_CAST(st);

if (!(st->mode & P9_DMSYMLINK)) {
- retval = -EINVAL;
- goto done;
+ p9stat_free(st);
+ kfree(st);
+ return ERR_PTR(-EINVAL);
}
+ if (strlen(st->extension) >= PATH_MAX)
+ st->extension[PATH_MAX - 1] = '\0';

- /* copy extension buffer into buffer */
- retval = min(strlen(st->extension)+1, (size_t)buflen);
- memcpy(buffer, st->extension, retval);
-
- p9_debug(P9_DEBUG_VFS, "%pd -> %s (%.*s)\n",
- dentry, st->extension, buflen, buffer);
-
-done:
+ nd_set_link(nd, st->extension);
+ st->extension = NULL;
p9stat_free(st);
kfree(st);
- return retval;
-}
-
-/**
- * v9fs_vfs_follow_link - follow a symlink path
- * @dentry: dentry for symlink
- * @nd: nameidata
- *
- */
-
-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- int len = 0;
- char *link = __getname();
-
- p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
-
- if (!link)
- link = ERR_PTR(-ENOMEM);
- else {
- len = v9fs_readlink(dentry, link, PATH_MAX);
-
- if (len < 0) {
- __putname(link);
- link = ERR_PTR(len);
- } else
- link[min(len, PATH_MAX-1)] = 0;
- }
- nd_set_link(nd, link);
-
return NULL;
}

/**
- * v9fs_vfs_put_link - release a symlink path
- * @dentry: dentry for symlink
- * @nd: nameidata
- * @p: unused
- *
- */
-
-void
-v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
-{
- char *s = nd_get_link(nd);
-
- p9_debug(P9_DEBUG_VFS, " %pd %s\n",
- dentry, IS_ERR(s) ? "<error>" : s);
- if (!IS_ERR(s))
- __putname(s);
-}
-
-/**
* v9fs_vfs_mkspecial - create a special file
* @dir: inode to create special file in
* @dentry: dentry to create
@@ -1514,7 +1457,7 @@ static const struct inode_operations v9fs_file_inode_operations = {
static const struct inode_operations v9fs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = v9fs_vfs_follow_link,
- .put_link = v9fs_vfs_put_link,
+ .put_link = kfree_put_link,
.getattr = v9fs_vfs_getattr,
.setattr = v9fs_vfs_setattr,
};
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 9861c7c..bc2a91f 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -912,33 +912,18 @@ error:
static void *
v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
{
- int retval;
- struct p9_fid *fid;
- char *link = __getname();
+ struct p9_fid *fid = v9fs_fid_lookup(dentry);
char *target;
+ int retval;

p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);

- if (!link) {
- link = ERR_PTR(-ENOMEM);
- goto ndset;
- }
- fid = v9fs_fid_lookup(dentry);
- if (IS_ERR(fid)) {
- __putname(link);
- link = ERR_CAST(fid);
- goto ndset;
- }
+ if (IS_ERR(fid))
+ return ERR_CAST(fid);
retval = p9_client_readlink(fid, &target);
- if (!retval) {
- strcpy(link, target);
- kfree(target);
- goto ndset;
- }
- __putname(link);
- link = ERR_PTR(retval);
-ndset:
- nd_set_link(nd, link);
+ if (retval)
+ return ERR_PTR(retval);
+ nd_set_link(nd, target);
return NULL;
}

@@ -1006,7 +991,7 @@ const struct inode_operations v9fs_file_inode_operations_dotl = {
const struct inode_operations v9fs_symlink_inode_operations_dotl = {
.readlink = generic_readlink,
.follow_link = v9fs_vfs_follow_link_dotl,
- .put_link = v9fs_vfs_put_link,
+ .put_link = kfree_put_link,
.getattr = v9fs_vfs_getattr_dotl,
.setattr = v9fs_vfs_setattr_dotl,
.setxattr = generic_setxattr,
--
2.1.4

2015-05-11 18:45:00

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 003/110] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link

From: NeilBrown <[email protected]>

ovl_follow_link current calls ->put_link on an error path.
However ->put_link is about to change in a way that it will be
impossible to call it from ovl_follow_link.

So rearrange the code to avoid the need for that error path.
Specifically: move the kmalloc() call before the ->follow_link()
call to the subordinate filesystem.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/overlayfs/inode.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 04f1248..1b4b9c5e 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -145,6 +145,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
void *ret;
struct dentry *realdentry;
struct inode *realinode;
+ struct ovl_link_data *data = NULL;

realdentry = ovl_dentry_real(dentry);
realinode = realdentry->d_inode;
@@ -152,25 +153,23 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
if (WARN_ON(!realinode->i_op->follow_link))
return ERR_PTR(-EPERM);

- ret = realinode->i_op->follow_link(realdentry, nd);
- if (IS_ERR(ret))
- return ret;
-
if (realinode->i_op->put_link) {
- struct ovl_link_data *data;
-
data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
- if (!data) {
- realinode->i_op->put_link(realdentry, nd, ret);
+ if (!data)
return ERR_PTR(-ENOMEM);
- }
data->realdentry = realdentry;
- data->cookie = ret;
+ }

- return data;
- } else {
- return NULL;
+ ret = realinode->i_op->follow_link(realdentry, nd);
+ if (IS_ERR(ret)) {
+ kfree(data);
+ return ret;
}
+
+ if (data)
+ data->cookie = ret;
+
+ return data;
}

static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
--
2.1.4

2015-05-11 18:45:04

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 004/110] ext4: split inode_operations for encrypted symlinks off the rest

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/inode.c | 6 ++++--
fs/ext4/namei.c | 9 +++++----
fs/ext4/symlink.c | 30 ++++++++++--------------------
4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 009a059..ad358f2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2847,6 +2847,7 @@ extern int ext4_mpage_readpages(struct address_space *mapping,
unsigned nr_pages);

/* symlink.c */
+extern const struct inode_operations ext4_encrypted_symlink_inode_operations;
extern const struct inode_operations ext4_symlink_inode_operations;
extern const struct inode_operations ext4_fast_symlink_inode_operations;

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 55b187c..9f3baa2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4213,8 +4213,10 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &ext4_dir_inode_operations;
inode->i_fop = &ext4_dir_operations;
} else if (S_ISLNK(inode->i_mode)) {
- if (ext4_inode_is_fast_symlink(inode) &&
- !ext4_encrypted_inode(inode)) {
+ if (ext4_encrypted_inode(inode)) {
+ inode->i_op = &ext4_encrypted_symlink_inode_operations;
+ ext4_set_aops(inode);
+ } else if (ext4_inode_is_fast_symlink(inode)) {
inode->i_op = &ext4_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 814f3be..39f8e65 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3206,10 +3206,12 @@ static int ext4_symlink(struct inode *dir,
goto err_drop_inode;
sd->len = cpu_to_le16(ostr.len);
disk_link.name = (char *) sd;
+ inode->i_op = &ext4_encrypted_symlink_inode_operations;
}

if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
- inode->i_op = &ext4_symlink_inode_operations;
+ if (!encryption_required)
+ inode->i_op = &ext4_symlink_inode_operations;
ext4_set_aops(inode);
/*
* We cannot call page_symlink() with transaction started
@@ -3249,9 +3251,8 @@ static int ext4_symlink(struct inode *dir,
} else {
/* clear the extent format for fast symlink */
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
- inode->i_op = encryption_required ?
- &ext4_symlink_inode_operations :
- &ext4_fast_symlink_inode_operations;
+ if (!encryption_required)
+ inode->i_op = &ext4_fast_symlink_inode_operations;
memcpy((char *)&EXT4_I(inode)->i_data, disk_link.name,
disk_link.len);
inode->i_size = disk_link.len - 1;
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 187b789..49575ff 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -35,9 +35,6 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
int res;
u32 plen, max_size = inode->i_sb->s_blocksize;

- if (!ext4_encrypted_inode(inode))
- return page_follow_link_light(dentry, nd);
-
ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize);
if (IS_ERR(ctx))
return ctx;
@@ -97,18 +94,16 @@ errout:
return ERR_PTR(res);
}

-static void ext4_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
-{
- struct page *page = cookie;
-
- if (!page) {
- kfree(nd_get_link(nd));
- } else {
- kunmap(page);
- page_cache_release(page);
- }
-}
+const struct inode_operations ext4_encrypted_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = ext4_follow_link,
+ .put_link = kfree_put_link,
+ .setattr = ext4_setattr,
+ .setxattr = generic_setxattr,
+ .getxattr = generic_getxattr,
+ .listxattr = ext4_listxattr,
+ .removexattr = generic_removexattr,
+};
#endif

static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
@@ -120,13 +115,8 @@ static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)

const struct inode_operations ext4_symlink_inode_operations = {
.readlink = generic_readlink,
-#ifdef CONFIG_EXT4_FS_ENCRYPTION
- .follow_link = ext4_follow_link,
- .put_link = ext4_put_link,
-#else
.follow_link = page_follow_link_light,
.put_link = page_put_link,
-#endif
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
--
2.1.4

2015-05-11 18:43:58

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 005/110] libfs: simple_follow_link()

From: Al Viro <[email protected]>

let "fast" symlinks store the pointer to the body into ->i_link and
use simple_follow_link for ->follow_link()

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/inode.c | 1 +
fs/libfs.c | 13 +++++++++++++
include/linux/fs.h | 3 +++
3 files changed, 17 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index ea37cd1..952fb48 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -152,6 +152,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
+ inode->i_link = NULL;
inode->i_rdev = 0;
inode->dirtied_when = 0;

diff --git a/fs/libfs.c b/fs/libfs.c
index cb1fb4b..72e4e01 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1093,3 +1093,16 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
return -EINVAL;
}
EXPORT_SYMBOL(simple_nosetlease);
+
+void *simple_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+ nd_set_link(nd, d_inode(dentry)->i_link);
+ return NULL;
+}
+EXPORT_SYMBOL(simple_follow_link);
+
+const struct inode_operations simple_symlink_inode_operations = {
+ .follow_link = simple_follow_link,
+ .readlink = generic_readlink
+};
+EXPORT_SYMBOL(simple_symlink_inode_operations);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35ec87e..0ac758f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -656,6 +656,7 @@ struct inode {
struct pipe_inode_info *i_pipe;
struct block_device *i_bdev;
struct cdev *i_cdev;
+ char *i_link;
};

__u32 i_generation;
@@ -2721,6 +2722,8 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);
+void *simple_follow_link(struct dentry *, struct nameidata *);
+extern const struct inode_operations simple_symlink_inode_operations;

extern int iterate_dir(struct file *, struct dir_context *);

--
2.1.4

2015-05-11 18:44:10

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 006/110] ext2: use simple_follow_link()

From: Al Viro <[email protected]>

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/ext2/inode.c | 1 +
fs/ext2/namei.c | 3 ++-
fs/ext2/symlink.c | 10 +---------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f460ae3..5c09776 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1403,6 +1403,7 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
inode->i_mapping->a_ops = &ext2_aops;
} else if (S_ISLNK(inode->i_mode)) {
if (ext2_inode_is_fast_symlink(inode)) {
+ inode->i_link = (char *)ei->i_data;
inode->i_op = &ext2_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 3e074a9..13ec54a 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -189,7 +189,8 @@ static int ext2_symlink (struct inode * dir, struct dentry * dentry,
} else {
/* fast symlink */
inode->i_op = &ext2_fast_symlink_inode_operations;
- memcpy((char*)(EXT2_I(inode)->i_data),symname,l);
+ inode->i_link = (char*)EXT2_I(inode)->i_data;
+ memcpy(inode->i_link, symname, l);
inode->i_size = l-1;
}
mark_inode_dirty(inode);
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 20608f1..ae17179 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -19,14 +19,6 @@

#include "ext2.h"
#include "xattr.h"
-#include <linux/namei.h>
-
-static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ext2_inode_info *ei = EXT2_I(d_inode(dentry));
- nd_set_link(nd, (char *)ei->i_data);
- return NULL;
-}

const struct inode_operations ext2_symlink_inode_operations = {
.readlink = generic_readlink,
@@ -43,7 +35,7 @@ const struct inode_operations ext2_symlink_inode_operations = {

const struct inode_operations ext2_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ext2_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ext2_setattr,
#ifdef CONFIG_EXT2_FS_XATTR
.setxattr = generic_setxattr,
--
2.1.4

2015-05-11 18:44:06

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 007/110] befs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/befs/linuxvfs.c | 24 +++++-------------------
1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 7943533..172e306 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -43,7 +43,6 @@ static struct inode *befs_alloc_inode(struct super_block *sb);
static void befs_destroy_inode(struct inode *inode);
static void befs_destroy_inodecache(void);
static void *befs_follow_link(struct dentry *, struct nameidata *);
-static void *befs_fast_follow_link(struct dentry *, struct nameidata *);
static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -80,11 +79,6 @@ static const struct address_space_operations befs_aops = {
.bmap = befs_bmap,
};

-static const struct inode_operations befs_fast_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = befs_fast_follow_link,
-};
-
static const struct inode_operations befs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = befs_follow_link,
@@ -403,10 +397,12 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &befs_dir_inode_operations;
inode->i_fop = &befs_dir_operations;
} else if (S_ISLNK(inode->i_mode)) {
- if (befs_ino->i_flags & BEFS_LONG_SYMLINK)
+ if (befs_ino->i_flags & BEFS_LONG_SYMLINK) {
inode->i_op = &befs_symlink_inode_operations;
- else
- inode->i_op = &befs_fast_symlink_inode_operations;
+ } else {
+ inode->i_link = befs_ino->i_data.symlink;
+ inode->i_op = &simple_symlink_inode_operations;
+ }
} else {
befs_error(sb, "Inode %lu is not a regular file, "
"directory or symlink. THAT IS WRONG! BeFS has no "
@@ -497,16 +493,6 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
return NULL;
}

-
-static void *
-befs_fast_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
-
- nd_set_link(nd, befs_ino->i_data.symlink);
- return NULL;
-}
-
/*
* UTF-8 to NLS charset convert routine
*
--
2.1.4

2015-05-11 18:39:29

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 008/110] ext3: switch to simple_follow_link()

From: Al Viro <[email protected]>

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/ext3/inode.c | 1 +
fs/ext3/namei.c | 3 ++-
fs/ext3/symlink.c | 10 +---------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 2ee2dc4..6c7e546 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2999,6 +2999,7 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &ext3_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
+ inode->i_link = (char *)ei->i_data;
} else {
inode->i_op = &ext3_symlink_inode_operations;
ext3_set_aops(inode);
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 4264b9b..c9e767c 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -2308,7 +2308,8 @@ retry:
}
} else {
inode->i_op = &ext3_fast_symlink_inode_operations;
- memcpy((char*)&EXT3_I(inode)->i_data,symname,l);
+ inode->i_link = (char*)&EXT3_I(inode)->i_data;
+ memcpy(inode->i_link, symname, l);
inode->i_size = l-1;
}
EXT3_I(inode)->i_disksize = inode->i_size;
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index ea96df3..c08c590 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -17,17 +17,9 @@
* ext3 symlink handling code
*/

-#include <linux/namei.h>
#include "ext3.h"
#include "xattr.h"

-static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ext3_inode_info *ei = EXT3_I(d_inode(dentry));
- nd_set_link(nd, (char*)ei->i_data);
- return NULL;
-}
-
const struct inode_operations ext3_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
@@ -43,7 +35,7 @@ const struct inode_operations ext3_symlink_inode_operations = {

const struct inode_operations ext3_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ext3_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ext3_setattr,
#ifdef CONFIG_EXT3_FS_XATTR
.setxattr = generic_setxattr,
--
2.1.4

2015-05-11 18:43:09

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 009/110] ext4: switch to simple_follow_link()

From: Al Viro <[email protected]>

for fast symlinks only, of course...

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/ext4/inode.c | 1 +
fs/ext4/namei.c | 4 +++-
fs/ext4/symlink.c | 9 +--------
3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9f3baa2..066fdd7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4217,6 +4217,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
inode->i_op = &ext4_encrypted_symlink_inode_operations;
ext4_set_aops(inode);
} else if (ext4_inode_is_fast_symlink(inode)) {
+ inode->i_link = (char *)ei->i_data;
inode->i_op = &ext4_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 39f8e65..5fdb9f6a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3251,8 +3251,10 @@ static int ext4_symlink(struct inode *dir,
} else {
/* clear the extent format for fast symlink */
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
- if (!encryption_required)
+ if (!encryption_required) {
inode->i_op = &ext4_fast_symlink_inode_operations;
+ inode->i_link = (char *)&EXT4_I(inode)->i_data;
+ }
memcpy((char *)&EXT4_I(inode)->i_data, disk_link.name,
disk_link.len);
inode->i_size = disk_link.len - 1;
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 49575ff..4264fb1 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -106,13 +106,6 @@ const struct inode_operations ext4_encrypted_symlink_inode_operations = {
};
#endif

-static void *ext4_follow_fast_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ext4_inode_info *ei = EXT4_I(d_inode(dentry));
- nd_set_link(nd, (char *) ei->i_data);
- return NULL;
-}
-
const struct inode_operations ext4_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
@@ -126,7 +119,7 @@ const struct inode_operations ext4_symlink_inode_operations = {

const struct inode_operations ext4_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ext4_follow_fast_link,
+ .follow_link = simple_follow_link,
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
--
2.1.4

2015-05-11 18:43:03

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 010/110] jffs2: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/jffs2/dir.c | 1 +
fs/jffs2/fs.c | 1 +
fs/jffs2/symlink.c | 45 +--------------------------------------------
3 files changed, 3 insertions(+), 44 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 1ba5c97..8118002 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -354,6 +354,7 @@ static int jffs2_symlink (struct inode *dir_i, struct dentry *dentry, const char
ret = -ENOMEM;
goto fail;
}
+ inode->i_link = f->target;

jffs2_dbg(1, "%s(): symlink's target '%s' cached\n",
__func__, (char *)f->target);
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index fe5ea08..60d86e8 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -294,6 +294,7 @@ struct inode *jffs2_iget(struct super_block *sb, unsigned long ino)

case S_IFLNK:
inode->i_op = &jffs2_symlink_inode_operations;
+ inode->i_link = f->target;
break;

case S_IFDIR:
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index 1fefa25..8ce2f24 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -9,58 +9,15 @@
*
*/

-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/kernel.h>
-#include <linux/fs.h>
-#include <linux/namei.h>
#include "nodelist.h"

-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd);
-
const struct inode_operations jffs2_symlink_inode_operations =
{
.readlink = generic_readlink,
- .follow_link = jffs2_follow_link,
+ .follow_link = simple_follow_link,
.setattr = jffs2_setattr,
.setxattr = jffs2_setxattr,
.getxattr = jffs2_getxattr,
.listxattr = jffs2_listxattr,
.removexattr = jffs2_removexattr
};
-
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct jffs2_inode_info *f = JFFS2_INODE_INFO(d_inode(dentry));
- char *p = (char *)f->target;
-
- /*
- * We don't acquire the f->sem mutex here since the only data we
- * use is f->target.
- *
- * 1. If we are here the inode has already built and f->target has
- * to point to the target path.
- * 2. Nobody uses f->target (if the inode is symlink's inode). The
- * exception is inode freeing function which frees f->target. But
- * it can't be called while we are here and before VFS has
- * stopped using our f->target string which we provide by means of
- * nd_set_link() call.
- */
-
- if (!p) {
- pr_err("%s(): can't find symlink target\n", __func__);
- p = ERR_PTR(-EIO);
- }
- jffs2_dbg(1, "%s(): target path is '%s'\n",
- __func__, (char *)f->target);
-
- nd_set_link(nd, p);
-
- /*
- * We will unlock the f->sem mutex but VFS will use the f->target string. This is safe
- * since the only way that may cause f->target to be changed is iput() operation.
- * But VFS will not use f->target after iput() has been called.
- */
- return NULL;
-}
-
--
2.1.4

2015-05-11 18:43:07

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 011/110] shmem: switch to simple_follow_link()

From: Al Viro <[email protected]>

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
mm/shmem.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index de98137..7f6e2f8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2451,6 +2451,7 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return -ENOMEM;
}
inode->i_op = &shmem_short_symlink_operations;
+ inode->i_link = info->symlink;
} else {
error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
if (error) {
@@ -2474,12 +2475,6 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return 0;
}

-static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, SHMEM_I(d_inode(dentry))->symlink);
- return NULL;
-}
-
static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct page *page = NULL;
@@ -2642,7 +2637,7 @@ static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)

static const struct inode_operations shmem_short_symlink_operations = {
.readlink = generic_readlink,
- .follow_link = shmem_follow_short_symlink,
+ .follow_link = simple_follow_link,
#ifdef CONFIG_TMPFS_XATTR
.setxattr = shmem_setxattr,
.getxattr = shmem_getxattr,
--
2.1.4

2015-05-11 18:41:39

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 012/110] debugfs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/debugfs/file.c | 12 ------------
fs/debugfs/inode.c | 6 +++---
include/linux/debugfs.h | 1 -
3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 830a7e7..284f9aa 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -17,7 +17,6 @@
#include <linux/fs.h>
#include <linux/seq_file.h>
#include <linux/pagemap.h>
-#include <linux/namei.h>
#include <linux/debugfs.h>
#include <linux/io.h>
#include <linux/slab.h>
@@ -43,17 +42,6 @@ const struct file_operations debugfs_file_operations = {
.llseek = noop_llseek,
};

-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, d_inode(dentry)->i_private);
- return NULL;
-}
-
-const struct inode_operations debugfs_link_operations = {
- .readlink = generic_readlink,
- .follow_link = debugfs_follow_link,
-};
-
static int debugfs_u8_set(void *data, u64 val)
{
*(u8 *)data = val;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index c1e7ffb..7eaec88 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -174,7 +174,7 @@ static void debugfs_evict_inode(struct inode *inode)
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
if (S_ISLNK(inode->i_mode))
- kfree(inode->i_private);
+ kfree(inode->i_link);
}

static const struct super_operations debugfs_super_operations = {
@@ -511,8 +511,8 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
return failed_creating(dentry);
}
inode->i_mode = S_IFLNK | S_IRWXUGO;
- inode->i_op = &debugfs_link_operations;
- inode->i_private = link;
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = link;
d_instantiate(dentry, inode);
return end_creating(dentry);
}
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index cb25af4..420311b 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -45,7 +45,6 @@ extern struct dentry *arch_debugfs_dir;

/* declared over in file.c */
extern const struct file_operations debugfs_file_operations;
-extern const struct inode_operations debugfs_link_operations;

struct dentry *debugfs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
--
2.1.4

2015-05-11 18:41:37

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 013/110] ufs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ufs/inode.c | 5 +++--
fs/ufs/namei.c | 3 ++-
fs/ufs/symlink.c | 13 +------------
3 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index be7d42c..99aaf5c 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -572,9 +572,10 @@ static void ufs_set_inode_ops(struct inode *inode)
inode->i_fop = &ufs_dir_operations;
inode->i_mapping->a_ops = &ufs_aops;
} else if (S_ISLNK(inode->i_mode)) {
- if (!inode->i_blocks)
+ if (!inode->i_blocks) {
inode->i_op = &ufs_fast_symlink_inode_operations;
- else {
+ inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;
+ } else {
inode->i_op = &ufs_symlink_inode_operations;
inode->i_mapping->a_ops = &ufs_aops;
}
diff --git a/fs/ufs/namei.c b/fs/ufs/namei.c
index e491a93..f773deb 100644
--- a/fs/ufs/namei.c
+++ b/fs/ufs/namei.c
@@ -144,7 +144,8 @@ static int ufs_symlink (struct inode * dir, struct dentry * dentry,
} else {
/* fast symlink */
inode->i_op = &ufs_fast_symlink_inode_operations;
- memcpy(UFS_I(inode)->i_u1.i_symlink, symname, l);
+ inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;
+ memcpy(inode->i_link, symname, l);
inode->i_size = l-1;
}
mark_inode_dirty(inode);
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index 5b537e2..874480b 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -25,23 +25,12 @@
* ext2 symlink handling code
*/

-#include <linux/fs.h>
-#include <linux/namei.h>
-
#include "ufs_fs.h"
#include "ufs.h"

-
-static void *ufs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ufs_inode_info *p = UFS_I(d_inode(dentry));
- nd_set_link(nd, (char*)p->i_u1.i_symlink);
- return NULL;
-}
-
const struct inode_operations ufs_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ufs_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ufs_setattr,
};

--
2.1.4

2015-05-11 18:41:34

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 014/110] ubifs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ubifs/dir.c | 1 +
fs/ubifs/file.c | 11 +----------
fs/ubifs/super.c | 1 +
3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 27060fc..5c27c66 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -889,6 +889,7 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry,

memcpy(ui->data, symname, len);
((char *)ui->data)[len] = '\0';
+ inode->i_link = ui->data;
/*
* The terminating zero byte is not written to the flash media and it
* is put just to make later in-memory string processing simpler. Thus,
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 35efc10..a3dfe2a 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -51,7 +51,6 @@

#include "ubifs.h"
#include <linux/mount.h>
-#include <linux/namei.h>
#include <linux/slab.h>

static int read_block(struct inode *inode, void *addr, unsigned int block,
@@ -1300,14 +1299,6 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
ClearPageChecked(page);
}

-static void *ubifs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ubifs_inode *ui = ubifs_inode(d_inode(dentry));
-
- nd_set_link(nd, ui->data);
- return NULL;
-}
-
int ubifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
struct inode *inode = file->f_mapping->host;
@@ -1570,7 +1561,7 @@ const struct inode_operations ubifs_file_inode_operations = {

const struct inode_operations ubifs_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = ubifs_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ubifs_setattr,
.getattr = ubifs_getattr,
.setxattr = ubifs_setxattr,
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 75e6f04..20f5dbd 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -195,6 +195,7 @@ struct inode *ubifs_iget(struct super_block *sb, unsigned long inum)
}
memcpy(ui->data, ino->data, ui->data_len);
((char *)ui->data)[ui->data_len] = '\0';
+ inode->i_link = ui->data;
break;
case S_IFBLK:
case S_IFCHR:
--
2.1.4

2015-05-11 18:41:32

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 015/110] sysv: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/sysv/Makefile | 2 +-
fs/sysv/inode.c | 5 +++--
fs/sysv/symlink.c | 20 --------------------
fs/sysv/sysv.h | 1 -
4 files changed, 4 insertions(+), 24 deletions(-)
delete mode 100644 fs/sysv/symlink.c

diff --git a/fs/sysv/Makefile b/fs/sysv/Makefile
index 3591f9d..7a75e70 100644
--- a/fs/sysv/Makefile
+++ b/fs/sysv/Makefile
@@ -5,4 +5,4 @@
obj-$(CONFIG_SYSV_FS) += sysv.o

sysv-objs := ialloc.o balloc.o inode.o itree.o file.o dir.o \
- namei.o super.o symlink.o
+ namei.o super.o
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index 8895630..590ad92 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -166,8 +166,9 @@ void sysv_set_inode(struct inode *inode, dev_t rdev)
inode->i_op = &sysv_symlink_inode_operations;
inode->i_mapping->a_ops = &sysv_aops;
} else {
- inode->i_op = &sysv_fast_symlink_inode_operations;
- nd_terminate_link(SYSV_I(inode)->i_data, inode->i_size,
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = (char *)SYSV_I(inode)->i_data;
+ nd_terminate_link(inode->i_link, inode->i_size,
sizeof(SYSV_I(inode)->i_data) - 1);
}
} else
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
deleted file mode 100644
index d3fa0d7..0000000
--- a/fs/sysv/symlink.c
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- * linux/fs/sysv/symlink.c
- *
- * Handling of System V filesystem fast symlinks extensions.
- * Aug 2001, Christoph Hellwig ([email protected])
- */
-
-#include "sysv.h"
-#include <linux/namei.h>
-
-static void *sysv_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, (char *)SYSV_I(d_inode(dentry))->i_data);
- return NULL;
-}
-
-const struct inode_operations sysv_fast_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = sysv_follow_link,
-};
diff --git a/fs/sysv/sysv.h b/fs/sysv/sysv.h
index 69d4889..2c13525 100644
--- a/fs/sysv/sysv.h
+++ b/fs/sysv/sysv.h
@@ -161,7 +161,6 @@ extern ino_t sysv_inode_by_name(struct dentry *);

extern const struct inode_operations sysv_file_inode_operations;
extern const struct inode_operations sysv_dir_inode_operations;
-extern const struct inode_operations sysv_fast_symlink_inode_operations;
extern const struct file_operations sysv_file_operations;
extern const struct file_operations sysv_dir_operations;
extern const struct address_space_operations sysv_aops;
--
2.1.4

2015-05-11 18:41:30

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 016/110] jfs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Reviewed-by: Jan Kara <[email protected]>
Acked-by: Dave Kleikamp <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/jfs/inode.c | 3 ++-
fs/jfs/namei.c | 5 ++---
fs/jfs/symlink.c | 10 +---------
3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 070dc4b..6f1cb2b 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -63,11 +63,12 @@ struct inode *jfs_iget(struct super_block *sb, unsigned long ino)
inode->i_mapping->a_ops = &jfs_aops;
} else {
inode->i_op = &jfs_fast_symlink_inode_operations;
+ inode->i_link = JFS_IP(inode)->i_inline;
/*
* The inline data should be null-terminated, but
* don't let on-disk corruption crash the kernel
*/
- JFS_IP(inode)->i_inline[inode->i_size] = '\0';
+ inode->i_link[inode->i_size] = '\0';
}
} else {
inode->i_op = &jfs_file_inode_operations;
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index 66db7bc..e33be92 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -880,7 +880,6 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
int ssize; /* source pathname size */
struct btstack btstack;
struct inode *ip = d_inode(dentry);
- unchar *i_fastsymlink;
s64 xlen = 0;
int bmask = 0, xsize;
s64 xaddr;
@@ -946,8 +945,8 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
if (ssize <= IDATASIZE) {
ip->i_op = &jfs_fast_symlink_inode_operations;

- i_fastsymlink = JFS_IP(ip)->i_inline;
- memcpy(i_fastsymlink, name, ssize);
+ ip->i_link = JFS_IP(ip)->i_inline;
+ memcpy(ip->i_link, name, ssize);
ip->i_size = ssize - 1;

/*
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index 80f42bc..5929e23 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -17,21 +17,13 @@
*/

#include <linux/fs.h>
-#include <linux/namei.h>
#include "jfs_incore.h"
#include "jfs_inode.h"
#include "jfs_xattr.h"

-static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- char *s = JFS_IP(d_inode(dentry))->i_inline;
- nd_set_link(nd, s);
- return NULL;
-}
-
const struct inode_operations jfs_fast_symlink_inode_operations = {
.readlink = generic_readlink,
- .follow_link = jfs_follow_link,
+ .follow_link = simple_follow_link,
.setattr = jfs_setattr,
.setxattr = jfs_setxattr,
.getxattr = jfs_getxattr,
--
2.1.4

2015-05-11 18:41:26

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 017/110] freevxfs: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/freevxfs/vxfs_extern.h | 3 ---
fs/freevxfs/vxfs_immed.c | 34 ----------------------------------
fs/freevxfs/vxfs_inode.c | 7 +++++--
3 files changed, 5 insertions(+), 39 deletions(-)

diff --git a/fs/freevxfs/vxfs_extern.h b/fs/freevxfs/vxfs_extern.h
index 881aa3d..e3dcb44 100644
--- a/fs/freevxfs/vxfs_extern.h
+++ b/fs/freevxfs/vxfs_extern.h
@@ -50,9 +50,6 @@ extern daddr_t vxfs_bmap1(struct inode *, long);
/* vxfs_fshead.c */
extern int vxfs_read_fshead(struct super_block *);

-/* vxfs_immed.c */
-extern const struct inode_operations vxfs_immed_symlink_iops;
-
/* vxfs_inode.c */
extern const struct address_space_operations vxfs_immed_aops;
extern struct kmem_cache *vxfs_inode_cachep;
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index 8b9229e..cb84f0f 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -32,29 +32,15 @@
*/
#include <linux/fs.h>
#include <linux/pagemap.h>
-#include <linux/namei.h>

#include "vxfs.h"
#include "vxfs_extern.h"
#include "vxfs_inode.h"


-static void * vxfs_immed_follow_link(struct dentry *, struct nameidata *);
-
static int vxfs_immed_readpage(struct file *, struct page *);

/*
- * Inode operations for immed symlinks.
- *
- * Unliked all other operations we do not go through the pagecache,
- * but do all work directly on the inode.
- */
-const struct inode_operations vxfs_immed_symlink_iops = {
- .readlink = generic_readlink,
- .follow_link = vxfs_immed_follow_link,
-};
-
-/*
* Address space operations for immed files and directories.
*/
const struct address_space_operations vxfs_immed_aops = {
@@ -62,26 +48,6 @@ const struct address_space_operations vxfs_immed_aops = {
};

/**
- * vxfs_immed_follow_link - follow immed symlink
- * @dp: dentry for the link
- * @np: pathname lookup data for the current path walk
- *
- * Description:
- * vxfs_immed_follow_link restarts the pathname lookup with
- * the data obtained from @dp.
- *
- * Returns:
- * Zero on success, else a negative error code.
- */
-static void *
-vxfs_immed_follow_link(struct dentry *dp, struct nameidata *np)
-{
- struct vxfs_inode_info *vip = VXFS_INO(d_inode(dp));
- nd_set_link(np, vip->vii_immed.vi_immed);
- return NULL;
-}
-
-/**
* vxfs_immed_readpage - read part of an immed inode into pagecache
* @file: file context (unused)
* @page: page frame to fill in.
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 363e3ae..ef73ed6 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -35,6 +35,7 @@
#include <linux/pagemap.h>
#include <linux/kernel.h>
#include <linux/slab.h>
+#include <linux/namei.h>

#include "vxfs.h"
#include "vxfs_inode.h"
@@ -327,8 +328,10 @@ vxfs_iget(struct super_block *sbp, ino_t ino)
ip->i_op = &page_symlink_inode_operations;
ip->i_mapping->a_ops = &vxfs_aops;
} else {
- ip->i_op = &vxfs_immed_symlink_iops;
- vip->vii_immed.vi_immed[ip->i_size] = '\0';
+ ip->i_op = &simple_symlink_inode_operations;
+ ip->i_link = vip->vii_immed.vi_immed;
+ nd_terminate_link(ip->i_link, ip->i_size,
+ sizeof(vip->vii_immed.vi_immed) - 1);
}
} else
init_special_inode(ip, ip->i_mode, old_decode_dev(vip->vii_rdev));
--
2.1.4

2015-05-11 18:39:34

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 018/110] exofs: switch to {simple,page}_symlink_inode_operations

From: Al Viro <[email protected]>

ACK-by: Boaz Harrosh <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/exofs/Kbuild | 2 +-
fs/exofs/exofs.h | 4 ----
fs/exofs/inode.c | 9 +++++----
fs/exofs/namei.c | 5 +++--
fs/exofs/symlink.c | 55 ------------------------------------------------------
5 files changed, 9 insertions(+), 66 deletions(-)
delete mode 100644 fs/exofs/symlink.c

diff --git a/fs/exofs/Kbuild b/fs/exofs/Kbuild
index b47c7b8..a364fd0 100644
--- a/fs/exofs/Kbuild
+++ b/fs/exofs/Kbuild
@@ -16,5 +16,5 @@
libore-y := ore.o ore_raid.o
obj-$(CONFIG_ORE) += libore.o

-exofs-y := inode.o file.o symlink.o namei.o dir.o super.o sys.o
+exofs-y := inode.o file.o namei.o dir.o super.o sys.o
obj-$(CONFIG_EXOFS_FS) += exofs.o
diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index ad9cac6..2e86086 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -207,10 +207,6 @@ extern const struct address_space_operations exofs_aops;
extern const struct inode_operations exofs_dir_inode_operations;
extern const struct inode_operations exofs_special_inode_operations;

-/* symlink.c */
-extern const struct inode_operations exofs_symlink_inode_operations;
-extern const struct inode_operations exofs_fast_symlink_inode_operations;
-
/* exofs_init_comps will initialize an ore_components device array
* pointing to a single ore_comp struct, and a round-robin view
* of the device table.
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 786e4cc..73c64da 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -1222,10 +1222,11 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
inode->i_fop = &exofs_dir_operations;
inode->i_mapping->a_ops = &exofs_aops;
} else if (S_ISLNK(inode->i_mode)) {
- if (exofs_inode_is_fast_symlink(inode))
- inode->i_op = &exofs_fast_symlink_inode_operations;
- else {
- inode->i_op = &exofs_symlink_inode_operations;
+ if (exofs_inode_is_fast_symlink(inode)) {
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = (char *)oi->i_data;
+ } else {
+ inode->i_op = &page_symlink_inode_operations;
inode->i_mapping->a_ops = &exofs_aops;
}
} else {
diff --git a/fs/exofs/namei.c b/fs/exofs/namei.c
index 5ae25e4..09a6bb1 100644
--- a/fs/exofs/namei.c
+++ b/fs/exofs/namei.c
@@ -113,7 +113,7 @@ static int exofs_symlink(struct inode *dir, struct dentry *dentry,
oi = exofs_i(inode);
if (l > sizeof(oi->i_data)) {
/* slow symlink */
- inode->i_op = &exofs_symlink_inode_operations;
+ inode->i_op = &page_symlink_inode_operations;
inode->i_mapping->a_ops = &exofs_aops;
memset(oi->i_data, 0, sizeof(oi->i_data));

@@ -122,7 +122,8 @@ static int exofs_symlink(struct inode *dir, struct dentry *dentry,
goto out_fail;
} else {
/* fast symlink */
- inode->i_op = &exofs_fast_symlink_inode_operations;
+ inode->i_op = &simple_symlink_inode_operations;
+ inode->i_link = (char *)oi->i_data;
memcpy(oi->i_data, symname, l);
inode->i_size = l-1;
}
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
deleted file mode 100644
index 6f6f3a4..0000000
--- a/fs/exofs/symlink.c
+++ /dev/null
@@ -1,55 +0,0 @@
-/*
- * Copyright (C) 2005, 2006
- * Avishay Traeger ([email protected])
- * Copyright (C) 2008, 2009
- * Boaz Harrosh <[email protected]>
- *
- * Copyrights for code taken from ext2:
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card ([email protected])
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- * from
- * linux/fs/minix/inode.c
- * Copyright (C) 1991, 1992 Linus Torvalds
- *
- * This file is part of exofs.
- *
- * exofs is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation. Since it is based on ext2, and the only
- * valid version of GPL for the Linux kernel is version 2, the only valid
- * version of GPL for exofs is version 2.
- *
- * exofs is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with exofs; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include <linux/namei.h>
-
-#include "exofs.h"
-
-static void *exofs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct exofs_i_info *oi = exofs_i(d_inode(dentry));
-
- nd_set_link(nd, (char *)oi->i_data);
- return NULL;
-}
-
-const struct inode_operations exofs_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = page_follow_link_light,
- .put_link = page_put_link,
-};
-
-const struct inode_operations exofs_fast_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = exofs_follow_link,
-};
--
2.1.4

2015-05-11 18:39:32

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 019/110] ceph: switch to simple_follow_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/ceph/inode.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e876e19..571acd8 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -6,7 +6,6 @@
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/kernel.h>
-#include <linux/namei.h>
#include <linux/writeback.h>
#include <linux/vmalloc.h>
#include <linux/posix_acl.h>
@@ -819,6 +818,7 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
else
kfree(sym); /* lost a race */
}
+ inode->i_link = ci->i_symlink;
break;
case S_IFDIR:
inode->i_op = &ceph_dir_iops;
@@ -1691,16 +1691,9 @@ retry:
/*
* symlinks
*/
-static void *ceph_sym_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- struct ceph_inode_info *ci = ceph_inode(d_inode(dentry));
- nd_set_link(nd, ci->i_symlink);
- return NULL;
-}
-
static const struct inode_operations ceph_symlink_iops = {
.readlink = generic_readlink,
- .follow_link = ceph_sym_follow_link,
+ .follow_link = simple_follow_link,
.setattr = ceph_setattr,
.getattr = ceph_getattr,
.setxattr = ceph_setxattr,
--
2.1.4

2015-05-11 18:39:24

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 020/110] logfs: fix a pagecache leak for symlinks

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/logfs/dir.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/logfs/dir.c b/fs/logfs/dir.c
index 4cf38f1..f9b45d4 100644
--- a/fs/logfs/dir.c
+++ b/fs/logfs/dir.c
@@ -779,6 +779,7 @@ fail:
const struct inode_operations logfs_symlink_iops = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
+ .put_link = page_put_link,
};

const struct inode_operations logfs_dir_iops = {
--
2.1.4

2015-05-11 18:39:21

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 021/110] SECURITY: remove nameidata arg from inode_follow_link.

From: NeilBrown <[email protected]>

No ->inode_follow_link() methods use the nameidata arg, and
it is about to become private to namei.c.
So remove from all inode_follow_link() functions.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 +-
include/linux/security.h | 9 +++------
security/capability.c | 3 +--
security/security.c | 4 ++--
security/selinux/hooks.c | 2 +-
5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe30d3b..7f20b40 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -871,7 +871,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
touch_atime(link);
nd_set_link(nd, NULL);

- error = security_inode_follow_link(link->dentry, nd);
+ error = security_inode_follow_link(dentry);
if (error)
goto out_put_nd_path;

diff --git a/include/linux/security.h b/include/linux/security.h
index 18264ea..62a6620 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -43,7 +43,6 @@ struct file;
struct vfsmount;
struct path;
struct qstr;
-struct nameidata;
struct iattr;
struct fown_struct;
struct file_operations;
@@ -477,7 +476,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* @inode_follow_link:
* Check permission to follow a symbolic link when looking up a pathname.
* @dentry contains the dentry structure for the link.
- * @nd contains the nameidata structure for the parent directory.
* Return 0 if permission is granted.
* @inode_permission:
* Check permission before accessing an inode. This hook is called by the
@@ -1553,7 +1551,7 @@ struct security_operations {
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
int (*inode_readlink) (struct dentry *dentry);
- int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
+ int (*inode_follow_link) (struct dentry *dentry);
int (*inode_permission) (struct inode *inode, int mask);
int (*inode_setattr) (struct dentry *dentry, struct iattr *attr);
int (*inode_getattr) (const struct path *path);
@@ -1839,7 +1837,7 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags);
int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
+int security_inode_follow_link(struct dentry *dentry);
int security_inode_permission(struct inode *inode, int mask);
int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
int security_inode_getattr(const struct path *path);
@@ -2241,8 +2239,7 @@ static inline int security_inode_readlink(struct dentry *dentry)
return 0;
}

-static inline int security_inode_follow_link(struct dentry *dentry,
- struct nameidata *nd)
+static inline int security_inode_follow_link(struct dentry *dentry)
{
return 0;
}
diff --git a/security/capability.c b/security/capability.c
index 0d03fcc..fae99db 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -209,8 +209,7 @@ static int cap_inode_readlink(struct dentry *dentry)
return 0;
}

-static int cap_inode_follow_link(struct dentry *dentry,
- struct nameidata *nameidata)
+static int cap_inode_follow_link(struct dentry *dentry)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index 8e9b1f4..d7c30b0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,11 @@ int security_inode_readlink(struct dentry *dentry)
return security_ops->inode_readlink(dentry);
}

-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd)
+int security_inode_follow_link(struct dentry *dentry)
{
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
- return security_ops->inode_follow_link(dentry, nd);
+ return security_ops->inode_follow_link(dentry);
}

int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 7dade28..8016229 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2861,7 +2861,7 @@ static int selinux_inode_readlink(struct dentry *dentry)
return dentry_has_perm(cred, dentry, FILE__READ);
}

-static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *nameidata)
+static int selinux_inode_follow_link(struct dentry *dentry)
{
const struct cred *cred = current_cred();

--
2.1.4

2015-05-11 18:39:19

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 022/110] uninline walk_component()

From: Al Viro <[email protected]>

seriously improves the stack *and* I-cache footprint...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7f20b40..a77f9ca 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1569,8 +1569,7 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
return unlikely(d_is_symlink(dentry)) ? follow : 0;
}

-static inline int walk_component(struct nameidata *nd, struct path *path,
- int follow)
+static int walk_component(struct nameidata *nd, struct path *path, int follow)
{
struct inode *inode;
int err;
--
2.1.4

2015-05-11 18:08:28

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 023/110] namei: take O_NOFOLLOW treatment into do_last()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a77f9ca..4c1a8bf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3059,6 +3059,11 @@ finish_lookup:
}
}
BUG_ON(inode != path->dentry->d_inode);
+ if (!(nd->flags & LOOKUP_FOLLOW)) {
+ path_put_conditional(path, nd);
+ path_put(&nd->path);
+ return -ELOOP;
+ }
return 1;
}

@@ -3243,12 +3248,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
while (unlikely(error > 0)) { /* trailing symlink */
struct path link = path;
void *cookie;
- if (!(nd->flags & LOOKUP_FOLLOW)) {
- path_put_conditional(&path, nd);
- path_put(&nd->path);
- error = -ELOOP;
- break;
- }
error = may_follow_link(&link, nd);
if (unlikely(error))
break;
--
2.1.4

2015-05-11 18:08:41

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 024/110] do_last: kill symlink_ok

From: Al Viro <[email protected]>

When O_PATH is present, O_CREAT isn't, so symlink_ok is always equal to
(open_flags & O_PATH) && !(nd->flags & LOOKUP_FOLLOW).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4c1a8bf..3e45233 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2931,7 +2931,6 @@ static int do_last(struct nameidata *nd, struct path *path,
bool got_write = false;
int acc_mode = op->acc_mode;
struct inode *inode;
- bool symlink_ok = false;
struct path save_parent = { .dentry = NULL, .mnt = NULL };
bool retried = false;
int error;
@@ -2949,8 +2948,6 @@ static int do_last(struct nameidata *nd, struct path *path,
if (!(open_flag & O_CREAT)) {
if (nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
- if (open_flag & O_PATH && !(nd->flags & LOOKUP_FOLLOW))
- symlink_ok = true;
/* we _can_ be in RCU mode here */
error = lookup_fast(nd, path, &inode);
if (likely(!error))
@@ -3050,7 +3047,8 @@ retry_lookup:
}
finish_lookup:
/* we _can_ be in RCU mode here */
- if (should_follow_link(path->dentry, !symlink_ok)) {
+ if (should_follow_link(path->dentry,
+ !(open_flag & O_PATH) || (nd->flags & LOOKUP_FOLLOW))) {
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path->mnt ||
unlazy_walk(nd, path->dentry))) {
--
2.1.4

2015-05-11 18:08:30

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 025/110] do_last: regularize the logics around following symlinks

From: Al Viro <[email protected]>

With LOOKUP_FOLLOW we unlazy and return 1; without it we either
fail with ELOOP or, for O_PATH opens, succeed. No need to mix
those cases...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3e45233..54cbfe7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3046,9 +3046,7 @@ retry_lookup:
goto out;
}
finish_lookup:
- /* we _can_ be in RCU mode here */
- if (should_follow_link(path->dentry,
- !(open_flag & O_PATH) || (nd->flags & LOOKUP_FOLLOW))) {
+ if (should_follow_link(path->dentry, nd->flags & LOOKUP_FOLLOW)) {
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path->mnt ||
unlazy_walk(nd, path->dentry))) {
@@ -3057,14 +3055,15 @@ finish_lookup:
}
}
BUG_ON(inode != path->dentry->d_inode);
- if (!(nd->flags & LOOKUP_FOLLOW)) {
- path_put_conditional(path, nd);
- path_put(&nd->path);
- return -ELOOP;
- }
return 1;
}

+ if (unlikely(d_is_symlink(path->dentry)) && !(open_flag & O_PATH)) {
+ path_to_nameidata(path, nd);
+ error = -ELOOP;
+ goto out;
+ }
+
if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path->mnt) {
path_to_nameidata(path, nd);
} else {
--
2.1.4

2015-05-11 18:08:38

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 026/110] namei: get rid of lookup_hash()

From: Al Viro <[email protected]>

it's a convenient helper, but we'll want to shift nameidata
down the call chain, so it won't be available there...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 54cbfe7..1439aa3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2128,16 +2128,6 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
}
EXPORT_SYMBOL(vfs_path_lookup);

-/*
- * Restricted form of lookup. Doesn't follow links, single-component only,
- * needs parent already locked. Doesn't follow mounts.
- * SMP-safe.
- */
-static struct dentry *lookup_hash(struct nameidata *nd)
-{
- return __lookup_hash(&nd->last, nd->path.dentry, nd->flags);
-}
-
/**
* lookup_one_len - filesystem helper to lookup single pathname component
* @name: pathname component to lookup
@@ -3351,7 +3341,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
* Do the final lookup.
*/
mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
+ dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
if (IS_ERR(dentry))
goto unlock;

@@ -3665,7 +3655,7 @@ retry:
goto exit1;

mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
+ dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit2;
@@ -3785,7 +3775,7 @@ retry:
goto exit1;
retry_deleg:
mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
+ dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
error = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
/* Why not before? Because we want correct error value */
@@ -4304,7 +4294,7 @@ retry:
retry_deleg:
trap = lock_rename(new_dir, old_dir);

- old_dentry = lookup_hash(&oldnd);
+ old_dentry = __lookup_hash(&oldnd.last, oldnd.path.dentry, oldnd.flags);
error = PTR_ERR(old_dentry);
if (IS_ERR(old_dentry))
goto exit3;
@@ -4312,7 +4302,7 @@ retry_deleg:
error = -ENOENT;
if (d_is_negative(old_dentry))
goto exit4;
- new_dentry = lookup_hash(&newnd);
+ new_dentry = __lookup_hash(&newnd.last, newnd.path.dentry, newnd.flags);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto exit4;
--
2.1.4

2015-05-11 18:36:33

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 027/110] name: shift nameidata down into user_path_walk()

From: Al Viro <[email protected]>

that avoids having nameidata on stack during the calls of
->rmdir()/->unlink() and *two* of those during the calls
of ->rename().

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 124 +++++++++++++++++++++++++++++++++----------------------------
1 file changed, 67 insertions(+), 57 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1439aa3..293300c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2211,9 +2211,13 @@ EXPORT_SYMBOL(user_path_at);
* path-walking is complete.
*/
static struct filename *
-user_path_parent(int dfd, const char __user *path, struct nameidata *nd,
+user_path_parent(int dfd, const char __user *path,
+ struct path *parent,
+ struct qstr *last,
+ int *type,
unsigned int flags)
{
+ struct nameidata nd;
struct filename *s = getname(path);
int error;

@@ -2223,11 +2227,14 @@ user_path_parent(int dfd, const char __user *path, struct nameidata *nd,
if (IS_ERR(s))
return s;

- error = filename_lookup(dfd, s, flags | LOOKUP_PARENT, nd);
+ error = filename_lookup(dfd, s, flags | LOOKUP_PARENT, &nd);
if (error) {
putname(s);
return ERR_PTR(error);
}
+ *parent = nd.path;
+ *last = nd.last;
+ *type = nd.last_type;

return s;
}
@@ -3630,14 +3637,17 @@ static long do_rmdir(int dfd, const char __user *pathname)
int error = 0;
struct filename *name;
struct dentry *dentry;
- struct nameidata nd;
+ struct path path;
+ struct qstr last;
+ int type;
unsigned int lookup_flags = 0;
retry:
- name = user_path_parent(dfd, pathname, &nd, lookup_flags);
+ name = user_path_parent(dfd, pathname,
+ &path, &last, &type, lookup_flags);
if (IS_ERR(name))
return PTR_ERR(name);

- switch(nd.last_type) {
+ switch (type) {
case LAST_DOTDOT:
error = -ENOTEMPTY;
goto exit1;
@@ -3649,13 +3659,12 @@ retry:
goto exit1;
}

- nd.flags &= ~LOOKUP_PARENT;
- error = mnt_want_write(nd.path.mnt);
+ error = mnt_want_write(path.mnt);
if (error)
goto exit1;

- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
+ mutex_lock_nested(&path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+ dentry = __lookup_hash(&last, path.dentry, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit2;
@@ -3663,17 +3672,17 @@ retry:
error = -ENOENT;
goto exit3;
}
- error = security_path_rmdir(&nd.path, dentry);
+ error = security_path_rmdir(&path, dentry);
if (error)
goto exit3;
- error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+ error = vfs_rmdir(path.dentry->d_inode, dentry);
exit3:
dput(dentry);
exit2:
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
- mnt_drop_write(nd.path.mnt);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
+ mnt_drop_write(path.mnt);
exit1:
- path_put(&nd.path);
+ path_put(&path);
putname(name);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -3756,43 +3765,45 @@ static long do_unlinkat(int dfd, const char __user *pathname)
int error;
struct filename *name;
struct dentry *dentry;
- struct nameidata nd;
+ struct path path;
+ struct qstr last;
+ int type;
struct inode *inode = NULL;
struct inode *delegated_inode = NULL;
unsigned int lookup_flags = 0;
retry:
- name = user_path_parent(dfd, pathname, &nd, lookup_flags);
+ name = user_path_parent(dfd, pathname,
+ &path, &last, &type, lookup_flags);
if (IS_ERR(name))
return PTR_ERR(name);

error = -EISDIR;
- if (nd.last_type != LAST_NORM)
+ if (type != LAST_NORM)
goto exit1;

- nd.flags &= ~LOOKUP_PARENT;
- error = mnt_want_write(nd.path.mnt);
+ error = mnt_want_write(path.mnt);
if (error)
goto exit1;
retry_deleg:
- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
+ mutex_lock_nested(&path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+ dentry = __lookup_hash(&last, path.dentry, lookup_flags);
error = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
/* Why not before? Because we want correct error value */
- if (nd.last.name[nd.last.len])
+ if (last.name[last.len])
goto slashes;
inode = dentry->d_inode;
if (d_is_negative(dentry))
goto slashes;
ihold(inode);
- error = security_path_unlink(&nd.path, dentry);
+ error = security_path_unlink(&path, dentry);
if (error)
goto exit2;
- error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
+ error = vfs_unlink(path.dentry->d_inode, dentry, &delegated_inode);
exit2:
dput(dentry);
}
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
if (inode)
iput(inode); /* truncate the inode here */
inode = NULL;
@@ -3801,9 +3812,9 @@ exit2:
if (!error)
goto retry_deleg;
}
- mnt_drop_write(nd.path.mnt);
+ mnt_drop_write(path.mnt);
exit1:
- path_put(&nd.path);
+ path_put(&path);
putname(name);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -4233,14 +4244,15 @@ EXPORT_SYMBOL(vfs_rename);
SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname, unsigned int, flags)
{
- struct dentry *old_dir, *new_dir;
struct dentry *old_dentry, *new_dentry;
struct dentry *trap;
- struct nameidata oldnd, newnd;
+ struct path old_path, new_path;
+ struct qstr old_last, new_last;
+ int old_type, new_type;
struct inode *delegated_inode = NULL;
struct filename *from;
struct filename *to;
- unsigned int lookup_flags = 0;
+ unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
bool should_retry = false;
int error;

@@ -4254,47 +4266,45 @@ SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
if ((flags & RENAME_WHITEOUT) && !capable(CAP_MKNOD))
return -EPERM;

+ if (flags & RENAME_EXCHANGE)
+ target_flags = 0;
+
retry:
- from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
+ from = user_path_parent(olddfd, oldname,
+ &old_path, &old_last, &old_type, lookup_flags);
if (IS_ERR(from)) {
error = PTR_ERR(from);
goto exit;
}

- to = user_path_parent(newdfd, newname, &newnd, lookup_flags);
+ to = user_path_parent(newdfd, newname,
+ &new_path, &new_last, &new_type, lookup_flags);
if (IS_ERR(to)) {
error = PTR_ERR(to);
goto exit1;
}

error = -EXDEV;
- if (oldnd.path.mnt != newnd.path.mnt)
+ if (old_path.mnt != new_path.mnt)
goto exit2;

- old_dir = oldnd.path.dentry;
error = -EBUSY;
- if (oldnd.last_type != LAST_NORM)
+ if (old_type != LAST_NORM)
goto exit2;

- new_dir = newnd.path.dentry;
if (flags & RENAME_NOREPLACE)
error = -EEXIST;
- if (newnd.last_type != LAST_NORM)
+ if (new_type != LAST_NORM)
goto exit2;

- error = mnt_want_write(oldnd.path.mnt);
+ error = mnt_want_write(old_path.mnt);
if (error)
goto exit2;

- oldnd.flags &= ~LOOKUP_PARENT;
- newnd.flags &= ~LOOKUP_PARENT;
- if (!(flags & RENAME_EXCHANGE))
- newnd.flags |= LOOKUP_RENAME_TARGET;
-
retry_deleg:
- trap = lock_rename(new_dir, old_dir);
+ trap = lock_rename(new_path.dentry, old_path.dentry);

- old_dentry = __lookup_hash(&oldnd.last, oldnd.path.dentry, oldnd.flags);
+ old_dentry = __lookup_hash(&old_last, old_path.dentry, lookup_flags);
error = PTR_ERR(old_dentry);
if (IS_ERR(old_dentry))
goto exit3;
@@ -4302,7 +4312,7 @@ retry_deleg:
error = -ENOENT;
if (d_is_negative(old_dentry))
goto exit4;
- new_dentry = __lookup_hash(&newnd.last, newnd.path.dentry, newnd.flags);
+ new_dentry = __lookup_hash(&new_last, new_path.dentry, lookup_flags | target_flags);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto exit4;
@@ -4316,16 +4326,16 @@ retry_deleg:

if (!d_is_dir(new_dentry)) {
error = -ENOTDIR;
- if (newnd.last.name[newnd.last.len])
+ if (new_last.name[new_last.len])
goto exit5;
}
}
/* unless the source is a directory trailing slashes give -ENOTDIR */
if (!d_is_dir(old_dentry)) {
error = -ENOTDIR;
- if (oldnd.last.name[oldnd.last.len])
+ if (old_last.name[old_last.len])
goto exit5;
- if (!(flags & RENAME_EXCHANGE) && newnd.last.name[newnd.last.len])
+ if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
goto exit5;
}
/* source should not be ancestor of target */
@@ -4338,32 +4348,32 @@ retry_deleg:
if (new_dentry == trap)
goto exit5;

- error = security_path_rename(&oldnd.path, old_dentry,
- &newnd.path, new_dentry, flags);
+ error = security_path_rename(&old_path, old_dentry,
+ &new_path, new_dentry, flags);
if (error)
goto exit5;
- error = vfs_rename(old_dir->d_inode, old_dentry,
- new_dir->d_inode, new_dentry,
+ error = vfs_rename(old_path.dentry->d_inode, old_dentry,
+ new_path.dentry->d_inode, new_dentry,
&delegated_inode, flags);
exit5:
dput(new_dentry);
exit4:
dput(old_dentry);
exit3:
- unlock_rename(new_dir, old_dir);
+ unlock_rename(new_path.dentry, old_path.dentry);
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
if (!error)
goto retry_deleg;
}
- mnt_drop_write(oldnd.path.mnt);
+ mnt_drop_write(old_path.mnt);
exit2:
if (retry_estale(error, lookup_flags))
should_retry = true;
- path_put(&newnd.path);
+ path_put(&new_path);
putname(to);
exit1:
- path_put(&oldnd.path);
+ path_put(&old_path);
putname(from);
if (should_retry) {
should_retry = false;
--
2.1.4

2015-05-11 18:08:34

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 028/110] namei: lift nameidata into filename_mountpoint()

From: Al Viro <[email protected]>

when we go for on-demand allocation of saved state in
link_path_walk(), we'll want nameidata to stay around
for all 3 calls of path_mountpoint().

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 293300c..ab2bcbd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2344,31 +2344,28 @@ out:
*/
static int
path_mountpoint(int dfd, const struct filename *name, struct path *path,
- unsigned int flags)
+ struct nameidata *nd, unsigned int flags)
{
- struct nameidata nd;
- int err;
-
- err = path_init(dfd, name, flags, &nd);
+ int err = path_init(dfd, name, flags, nd);
if (unlikely(err))
goto out;

- err = mountpoint_last(&nd, path);
+ err = mountpoint_last(nd, path);
while (err > 0) {
void *cookie;
struct path link = *path;
- err = may_follow_link(&link, &nd);
+ err = may_follow_link(&link, nd);
if (unlikely(err))
break;
- nd.flags |= LOOKUP_PARENT;
- err = follow_link(&link, &nd, &cookie);
+ nd->flags |= LOOKUP_PARENT;
+ err = follow_link(&link, nd, &cookie);
if (err)
break;
- err = mountpoint_last(&nd, path);
- put_link(&nd, &link, cookie);
+ err = mountpoint_last(nd, path);
+ put_link(nd, &link, cookie);
}
out:
- path_cleanup(&nd);
+ path_cleanup(nd);
return err;
}

@@ -2376,14 +2373,15 @@ static int
filename_mountpoint(int dfd, struct filename *name, struct path *path,
unsigned int flags)
{
+ struct nameidata nd;
int error;
if (IS_ERR(name))
return PTR_ERR(name);
- error = path_mountpoint(dfd, name, path, flags | LOOKUP_RCU);
+ error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
if (unlikely(error == -ECHILD))
- error = path_mountpoint(dfd, name, path, flags);
+ error = path_mountpoint(dfd, name, path, &nd, flags);
if (unlikely(error == -ESTALE))
- error = path_mountpoint(dfd, name, path, flags | LOOKUP_REVAL);
+ error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
if (likely(!error))
audit_inode(name, path->dentry, 0);
putname(name);
--
2.1.4

2015-05-11 18:36:22

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 029/110] new ->follow_link() and ->put_link() calling conventions

From: Al Viro <[email protected]>

a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.

Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/Locking | 4 +-
Documentation/filesystems/vfs.txt | 4 +-
drivers/staging/lustre/lustre/llite/symlink.c | 11 ++---
fs/9p/vfs_inode.c | 13 +++---
fs/9p/vfs_inode_dotl.c | 7 ++-
fs/autofs4/symlink.c | 5 +-
fs/befs/linuxvfs.c | 35 +++++++-------
fs/cifs/cifsfs.h | 2 +-
fs/cifs/link.c | 28 ++++++------
fs/configfs/symlink.c | 28 +++++-------
fs/ecryptfs/inode.c | 8 ++--
fs/ext4/symlink.c | 9 ++--
fs/f2fs/namei.c | 18 +++-----
fs/fuse/dir.c | 19 ++------
fs/gfs2/inode.c | 10 ++--
fs/hostfs/hostfs_kern.c | 15 +++---
fs/hppfs/hppfs.c | 9 ++--
fs/kernfs/symlink.c | 22 ++++-----
fs/libfs.c | 12 ++---
fs/namei.c | 66 +++++++++------------------
fs/nfs/symlink.c | 19 +++-----
fs/overlayfs/inode.c | 18 ++++----
fs/proc/base.c | 2 +-
fs/proc/inode.c | 9 ++--
fs/proc/namespaces.c | 2 +-
fs/proc/self.c | 24 +++++-----
fs/proc/thread_self.c | 22 ++++-----
fs/xfs/xfs_iops.c | 10 ++--
include/linux/fs.h | 12 ++---
include/linux/namei.h | 2 -
mm/shmem.c | 23 +++++-----
31 files changed, 195 insertions(+), 273 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0a926e2..7fa6c4a 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,8 +50,8 @@ prototypes:
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ void (*put_link) (struct dentry *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5d833b3..1c6b03a 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ void (*put_link) (struct dentry *, void *);
int (*permission) (struct inode *, int);
int (*get_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 3711e67..e488cb3 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
return rc;
}

-static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ll_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct ptlrpc_request *request = NULL;
@@ -140,18 +140,17 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
}
if (rc) {
ptlrpc_req_finished(request);
- request = NULL;
- symname = ERR_PTR(rc);
+ return ERR_PTR(rc);
}

- nd_set_link(nd, symname);
/* symname may contain a pointer to the request message buffer,
* we delay request releasing until ll_put_link then.
*/
- return request;
+ *cookie = request;
+ return symname;
}

-static void ll_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void ll_put_link(struct dentry *dentry, void *cookie)
{
ptlrpc_req_finished(cookie);
}
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0ba1171..7cc70a3 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1230,11 +1230,12 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
*
*/

-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
struct p9_fid *fid = v9fs_fid_lookup(dentry);
struct p9_wstat *st;
+ char *res;

p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);

@@ -1253,14 +1254,14 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
kfree(st);
return ERR_PTR(-EINVAL);
}
- if (strlen(st->extension) >= PATH_MAX)
- st->extension[PATH_MAX - 1] = '\0';
-
- nd_set_link(nd, st->extension);
+ res = st->extension;
st->extension = NULL;
+ if (strlen(res) >= PATH_MAX)
+ res[PATH_MAX - 1] = '\0';
+
p9stat_free(st);
kfree(st);
- return NULL;
+ return *cookie = res;
}

/**
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index bc2a91f..ae062ff 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -909,8 +909,8 @@ error:
*
*/

-static void *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
+static const char *
+v9fs_vfs_follow_link_dotl(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct p9_fid *fid = v9fs_fid_lookup(dentry);
char *target;
@@ -923,8 +923,7 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
retval = p9_client_readlink(fid, &target);
if (retval)
return ERR_PTR(retval);
- nd_set_link(nd, target);
- return NULL;
+ return *cookie = target;
}

int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode)
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index de58cc7..9c6a077 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,14 +12,13 @@

#include "autofs_i.h"

-static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *autofs4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
if (ino && !autofs4_oz_mode(sbi))
ino->last_used = jiffies;
- nd_set_link(nd, d_inode(dentry)->i_private);
- return NULL;
+ return d_inode(dentry)->i_private;
}

const struct inode_operations autofs4_symlink_inode_operations = {
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 172e306..3a1aefb 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,7 +42,7 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
static struct inode *befs_alloc_inode(struct super_block *sb);
static void befs_destroy_inode(struct inode *inode);
static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *, struct nameidata *);
+static const char *befs_follow_link(struct dentry *, void **, struct nameidata *nd);
static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -463,8 +463,8 @@ befs_destroy_inodecache(void)
* The data stream become link name. Unless the LONG_SYMLINK
* flag is set.
*/
-static void *
-befs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *
+befs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct super_block *sb = dentry->d_sb;
struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
@@ -474,23 +474,20 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)

if (len == 0) {
befs_error(sb, "Long symlink with illegal length");
- link = ERR_PTR(-EIO);
- } else {
- befs_debug(sb, "Follow long symlink");
-
- link = kmalloc(len, GFP_NOFS);
- if (!link) {
- link = ERR_PTR(-ENOMEM);
- } else if (befs_read_lsymlink(sb, data, link, len) != len) {
- kfree(link);
- befs_error(sb, "Failed to read entire long symlink");
- link = ERR_PTR(-EIO);
- } else {
- link[len - 1] = '\0';
- }
+ return ERR_PTR(-EIO);
}
- nd_set_link(nd, link);
- return NULL;
+ befs_debug(sb, "Follow long symlink");
+
+ link = kmalloc(len, GFP_NOFS);
+ if (!link)
+ return ERR_PTR(-ENOMEM);
+ if (befs_read_lsymlink(sb, data, link, len) != len) {
+ kfree(link);
+ befs_error(sb, "Failed to read entire long symlink");
+ return ERR_PTR(-EIO);
+ }
+ link[len - 1] = '\0';
+ return *cookie = link;
}

/*
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 252f5c1..61012da 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
#endif

/* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern const char *cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd);
extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
int buflen);
extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 252e672..4a439c2 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -626,8 +626,8 @@ cifs_hl_exit:
return rc;
}

-void *
-cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
+const char *
+cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(direntry);
int rc = -ENOMEM;
@@ -643,16 +643,18 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)

tlink = cifs_sb_tlink(cifs_sb);
if (IS_ERR(tlink)) {
- rc = PTR_ERR(tlink);
- tlink = NULL;
- goto out;
+ free_xid(xid);
+ return ERR_CAST(tlink);
}
tcon = tlink_tcon(tlink);
server = tcon->ses->server;

full_path = build_path_from_dentry(direntry);
- if (!full_path)
- goto out;
+ if (!full_path) {
+ free_xid(xid);
+ cifs_put_tlink(tlink);
+ return ERR_PTR(-ENOMEM);
+ }

cifs_dbg(FYI, "Full path: %s inode = 0x%p\n", full_path, inode);

@@ -670,17 +672,13 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
&target_path, cifs_sb);

kfree(full_path);
-out:
+ free_xid(xid);
+ cifs_put_tlink(tlink);
if (rc != 0) {
kfree(target_path);
- target_path = ERR_PTR(rc);
+ return ERR_PTR(rc);
}
-
- free_xid(xid);
- if (tlink)
- cifs_put_tlink(tlink);
- nd_set_link(nd, target_path);
- return NULL;
+ return *cookie = target_path;
}

int
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cc9f254..fac8e85 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,30 +279,26 @@ static int configfs_getlink(struct dentry *dentry, char * path)

}

-static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *configfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
+ int error;

- if (page) {
- error = configfs_getlink(dentry, (char *)page);
- if (!error) {
- nd_set_link(nd, (char *)page);
- return (void *)page;
- }
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ error = configfs_getlink(dentry, (char *)page);
+ if (!error) {
+ return *cookie = (void *)page;
}

- nd_set_link(nd, ERR_PTR(error));
- return NULL;
+ free_page(page);
+ return ERR_PTR(error);
}

-static void configfs_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+static void configfs_put_link(struct dentry *dentry, void *cookie)
{
- if (cookie) {
- unsigned long page = (unsigned long)cookie;
- free_page(page);
- }
+ free_page((unsigned long)cookie);
}

const struct inode_operations configfs_symlink_inode_operations = {
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index fc850b5..cdb9d6c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -675,18 +675,16 @@ out:
return rc ? ERR_PTR(rc) : buf;
}

-static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ecryptfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
size_t len;
char *buf = ecryptfs_readlink_lower(dentry, &len);
if (IS_ERR(buf))
- goto out;
+ return buf;
fsstack_copy_attr_atime(d_inode(dentry),
d_inode(ecryptfs_dentry_to_lower(dentry)));
buf[len] = '\0';
-out:
- nd_set_link(nd, buf);
- return NULL;
+ return *cookie = buf;
}

/**
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 4264fb1..afec475 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,7 +23,7 @@
#include "xattr.h"

#ifdef CONFIG_EXT4_FS_ENCRYPTION
-static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ext4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct page *cpage = NULL;
char *caddr, *paddr = NULL;
@@ -37,7 +37,7 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)

ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize);
if (IS_ERR(ctx))
- return ctx;
+ return ERR_CAST(ctx);

if (ext4_inode_is_fast_symlink(inode)) {
caddr = (char *) EXT4_I(inode)->i_data;
@@ -46,7 +46,7 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
cpage = read_mapping_page(inode->i_mapping, 0, NULL);
if (IS_ERR(cpage)) {
ext4_put_fname_crypto_ctx(&ctx);
- return cpage;
+ return ERR_CAST(cpage);
}
caddr = kmap(cpage);
caddr[size] = 0;
@@ -77,13 +77,12 @@ static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
/* Null-terminate the name */
if (res <= plen)
paddr[res] = '\0';
- nd_set_link(nd, paddr);
ext4_put_fname_crypto_ctx(&ctx);
if (cpage) {
kunmap(cpage);
page_cache_release(cpage);
}
- return NULL;
+ return *cookie = paddr;
errout:
ext4_put_fname_crypto_ctx(&ctx);
if (cpage) {
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 658e807..d294793 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -296,19 +296,15 @@ fail:
return err;
}

-static void *f2fs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *f2fs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- struct page *page = page_follow_link_light(dentry, nd);
-
- if (IS_ERR_OR_NULL(page))
- return page;
-
- /* this is broken symlink case */
- if (*nd_get_link(nd) == 0) {
- page_put_link(dentry, nd, page);
- return ERR_PTR(-ENOENT);
+ const char *link = page_follow_link_light(dentry, cookie, nd);
+ if (!IS_ERR(link) && !*link) {
+ /* this is broken symlink case */
+ page_put_link(dentry, *cookie);
+ link = ERR_PTR(-ENOENT);
}
- return page;
+ return link;
}

static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0572bca..f9cb260 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1365,7 +1365,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
return err;
}

-static char *read_link(struct dentry *dentry)
+static const char *fuse_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1389,26 +1389,15 @@ static char *read_link(struct dentry *dentry)
link = ERR_PTR(ret);
} else {
link[ret] = '\0';
+ *cookie = link;
}
fuse_invalidate_atime(inode);
return link;
}

-static void free_link(char *link)
+static void fuse_put_link(struct dentry *dentry, void *cookie)
{
- if (!IS_ERR(link))
- free_page((unsigned long) link);
-}
-
-static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
- nd_set_link(nd, read_link(dentry));
- return NULL;
-}
-
-static void fuse_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
-{
- free_link(nd_get_link(nd));
+ free_page((unsigned long) cookie);
}

static int fuse_dir_open(struct inode *inode, struct file *file)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 1b3ca7a..f59390a 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1548,7 +1548,7 @@ out:
* Returns: 0 on success or error code
*/

-static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *gfs2_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct gfs2_inode *ip = GFS2_I(d_inode(dentry));
struct gfs2_holder i_gh;
@@ -1561,8 +1561,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
error = gfs2_glock_nq(&i_gh);
if (error) {
gfs2_holder_uninit(&i_gh);
- nd_set_link(nd, ERR_PTR(error));
- return NULL;
+ return ERR_PTR(error);
}

size = (unsigned int)i_size_read(&ip->i_inode);
@@ -1586,8 +1585,9 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
brelse(dibh);
out:
gfs2_glock_dq_uninit(&i_gh);
- nd_set_link(nd, buf);
- return NULL;
+ if (!IS_ERR(buf))
+ *cookie = buf;
+ return buf;
}

/**
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index ef26317..f650ed6 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -892,7 +892,7 @@ static const struct inode_operations hostfs_dir_iops = {
.setattr = hostfs_setattr,
};

-static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *hostfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
char *link = __getname();
if (link) {
@@ -906,21 +906,18 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
}
if (err < 0) {
__putname(link);
- link = ERR_PTR(err);
+ return ERR_PTR(err);
}
} else {
- link = ERR_PTR(-ENOMEM);
+ return ERR_PTR(-ENOMEM);
}

- nd_set_link(nd, link);
- return NULL;
+ return *cookie = link;
}

-static void hostfs_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void hostfs_put_link(struct dentry *dentry, void *cookie)
{
- char *s = nd_get_link(nd);
- if (!IS_ERR(s))
- __putname(s);
+ __putname(cookie);
}

static const struct inode_operations hostfs_link_iops = {
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index fa2bd53..b8f24d3 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,20 +642,19 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
buflen);
}

-static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *hppfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

- return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, nd);
+ return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie, nd);
}

-static void hppfs_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+static void hppfs_put_link(struct dentry *dentry, void *cookie)
{
struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

if (d_inode(proc_dentry)->i_op->put_link)
- d_inode(proc_dentry)->i_op->put_link(proc_dentry, nd, cookie);
+ d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
}

static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8a19889..3c7e799 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,25 +112,23 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
return error;
}

-static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
- if (page) {
- error = kernfs_getlink(dentry, (char *) page);
- if (error < 0)
- free_page((unsigned long)page);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+ error = kernfs_getlink(dentry, (char *)page);
+ if (unlikely(error < 0)) {
+ free_page((unsigned long)page);
+ return ERR_PTR(error);
}
- nd_set_link(nd, error ? ERR_PTR(error) : (char *)page);
- return NULL;
+ return *cookie = (char *)page;
}

-static void kernfs_iop_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+static void kernfs_iop_put_link(struct dentry *dentry, void *cookie)
{
- char *page = nd_get_link(nd);
- if (!IS_ERR(page))
- free_page((unsigned long)page);
+ free_page((unsigned long)cookie);
}

const struct inode_operations kernfs_symlink_iops = {
diff --git a/fs/libfs.c b/fs/libfs.c
index 72e4e01..0c83fde 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,12 +1024,9 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
}
EXPORT_SYMBOL(noop_fsync);

-void kfree_put_link(struct dentry *dentry, struct nameidata *nd,
- void *cookie)
+void kfree_put_link(struct dentry *dentry, void *cookie)
{
- char *s = nd_get_link(nd);
- if (!IS_ERR(s))
- kfree(s);
+ kfree(cookie);
}
EXPORT_SYMBOL(kfree_put_link);

@@ -1094,10 +1091,9 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
}
EXPORT_SYMBOL(simple_nosetlease);

-void *simple_follow_link(struct dentry *dentry, struct nameidata *nd)
+const char *simple_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- nd_set_link(nd, d_inode(dentry)->i_link);
- return NULL;
+ return d_inode(dentry)->i_link;
}
EXPORT_SYMBOL(simple_follow_link);

diff --git a/fs/namei.c b/fs/namei.c
index ab2bcbd..aeca448 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -502,7 +502,6 @@ struct nameidata {
int last_type;
unsigned depth;
struct file *base;
- char *saved_names[MAX_NESTED_LINKS + 1];
};

/*
@@ -713,23 +712,11 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
nd->flags |= LOOKUP_JUMPED;
}

-void nd_set_link(struct nameidata *nd, char *path)
-{
- nd->saved_names[nd->depth] = path;
-}
-EXPORT_SYMBOL(nd_set_link);
-
-char *nd_get_link(struct nameidata *nd)
-{
- return nd->saved_names[nd->depth];
-}
-EXPORT_SYMBOL(nd_get_link);
-
static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
{
struct inode *inode = link->dentry->d_inode;
- if (inode->i_op->put_link)
- inode->i_op->put_link(link->dentry, nd, cookie);
+ if (cookie && inode->i_op->put_link)
+ inode->i_op->put_link(link->dentry, cookie);
path_put(link);
}

@@ -854,7 +841,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
{
struct dentry *dentry = link->dentry;
int error;
- char *s;
+ const char *s;

BUG_ON(nd->flags & LOOKUP_RCU);

@@ -869,26 +856,20 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
current->total_link_count++;

touch_atime(link);
- nd_set_link(nd, NULL);

error = security_inode_follow_link(dentry);
if (error)
goto out_put_nd_path;

nd->last_type = LAST_BIND;
- *p = dentry->d_inode->i_op->follow_link(dentry, nd);
- error = PTR_ERR(*p);
- if (IS_ERR(*p))
+ *p = NULL;
+ s = dentry->d_inode->i_op->follow_link(dentry, p, nd);
+ error = PTR_ERR(s);
+ if (IS_ERR(s))
goto out_put_nd_path;

error = 0;
- s = nd_get_link(nd);
if (s) {
- if (unlikely(IS_ERR(s))) {
- path_put(&nd->path);
- put_link(nd, link, *p);
- return PTR_ERR(s);
- }
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -906,7 +887,6 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
return error;

out_put_nd_path:
- *p = NULL;
path_put(&nd->path);
path_put(link);
return error;
@@ -4430,18 +4410,15 @@ EXPORT_SYMBOL(readlink_copy);
*/
int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
{
- struct nameidata nd;
void *cookie;
+ const char *link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
int res;

- nd.depth = 0;
- cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
- if (IS_ERR(cookie))
- return PTR_ERR(cookie);
-
- res = readlink_copy(buffer, buflen, nd_get_link(&nd));
- if (dentry->d_inode->i_op->put_link)
- dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ res = readlink_copy(buffer, buflen, link);
+ if (cookie && dentry->d_inode->i_op->put_link)
+ dentry->d_inode->i_op->put_link(dentry, cookie);
return res;
}
EXPORT_SYMBOL(generic_readlink);
@@ -4473,22 +4450,21 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
}
EXPORT_SYMBOL(page_readlink);

-void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+const char *page_follow_link_light(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct page *page = NULL;
- nd_set_link(nd, page_getlink(dentry, &page));
- return page;
+ char *res = page_getlink(dentry, &page);
+ if (!IS_ERR(res))
+ *cookie = page;
+ return res;
}
EXPORT_SYMBOL(page_follow_link_light);

-void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+void page_put_link(struct dentry *dentry, void *cookie)
{
struct page *page = cookie;
-
- if (page) {
- kunmap(page);
- page_cache_release(page);
- }
+ kunmap(page);
+ page_cache_release(page);
}
EXPORT_SYMBOL(page_put_link);

diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 2d56200..c992b20 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -20,7 +20,6 @@
#include <linux/stat.h>
#include <linux/mm.h>
#include <linux/string.h>
-#include <linux/namei.h>

/* Symlink caching in the page cache is even more simplistic
* and straight-forward than readdir caching.
@@ -43,7 +42,7 @@ error:
return -EIO;
}

-static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *nfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct page *page;
@@ -51,19 +50,13 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)

err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
if (err)
- goto read_failed;
+ return err;
page = read_cache_page(&inode->i_data, 0,
(filler_t *)nfs_symlink_filler, inode);
- if (IS_ERR(page)) {
- err = page;
- goto read_failed;
- }
- nd_set_link(nd, kmap(page));
- return page;
-
-read_failed:
- nd_set_link(nd, err);
- return NULL;
+ if (IS_ERR(page))
+ return ERR_CAST(page);
+ *cookie = page;
+ return kmap(page);
}

/*
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1b4b9c5e..235ad42 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,12 +140,12 @@ struct ovl_link_data {
void *cookie;
};

-static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *ovl_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
- void *ret;
struct dentry *realdentry;
struct inode *realinode;
struct ovl_link_data *data = NULL;
+ const char *ret;

realdentry = ovl_dentry_real(dentry);
realinode = realdentry->d_inode;
@@ -160,19 +160,21 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
data->realdentry = realdentry;
}

- ret = realinode->i_op->follow_link(realdentry, nd);
- if (IS_ERR(ret)) {
+ ret = realinode->i_op->follow_link(realdentry, cookie, nd);
+ if (IS_ERR_OR_NULL(ret)) {
kfree(data);
return ret;
}

if (data)
- data->cookie = ret;
+ data->cookie = *cookie;

- return data;
+ *cookie = data;
+
+ return ret;
}

-static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void ovl_put_link(struct dentry *dentry, void *c)
{
struct inode *realinode;
struct ovl_link_data *data = c;
@@ -181,7 +183,7 @@ static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
return;

realinode = data->realdentry->d_inode;
- realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+ realinode->i_op->put_link(data->realdentry, data->cookie);
kfree(data);
}

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 093ca14..52652f8 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1380,7 +1380,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
return -ENOENT;
}

-static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
struct path path;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 8272aab..acd51d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -23,7 +23,6 @@
#include <linux/slab.h>
#include <linux/mount.h>
#include <linux/magic.h>
-#include <linux/namei.h>

#include <asm/uaccess.h>

@@ -394,16 +393,16 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
};
#endif

-static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct proc_dir_entry *pde = PDE(d_inode(dentry));
if (unlikely(!use_pde(pde)))
return ERR_PTR(-EINVAL);
- nd_set_link(nd, pde->data);
- return pde;
+ *cookie = pde;
+ return pde->data;
}

-static void proc_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+static void proc_put_link(struct dentry *dentry, void *p)
{
unuse_pde(p);
}
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index e512642..10d24dd 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
&mntns_operations,
};

-static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct inode *inode = d_inode(dentry);
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 6195b4a..ad33394 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -1,5 +1,4 @@
#include <linux/sched.h>
-#include <linux/namei.h>
#include <linux/slab.h>
#include <linux/pid_namespace.h>
#include "internal.h"
@@ -19,21 +18,20 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
- char *name = ERR_PTR(-ENOENT);
- if (tgid) {
- /* 11 for max length of signed int in decimal + NULL term */
- name = kmalloc(12, GFP_KERNEL);
- if (!name)
- name = ERR_PTR(-ENOMEM);
- else
- sprintf(name, "%d", tgid);
- }
- nd_set_link(nd, name);
- return NULL;
+ char *name;
+
+ if (!tgid)
+ return ERR_PTR(-ENOENT);
+ /* 11 for max length of signed int in decimal + NULL term */
+ name = kmalloc(12, GFP_KERNEL);
+ if (!name)
+ return ERR_PTR(-ENOMEM);
+ sprintf(name, "%d", tgid);
+ return *cookie = name;
}

static const struct inode_operations proc_self_inode_operations = {
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index a837199..85c96e0 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -1,5 +1,4 @@
#include <linux/sched.h>
-#include <linux/namei.h>
#include <linux/slab.h>
#include <linux/pid_namespace.h>
#include "internal.h"
@@ -20,21 +19,20 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *proc_thread_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
pid_t pid = task_pid_nr_ns(current, ns);
- char *name = ERR_PTR(-ENOENT);
- if (pid) {
- name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
- if (!name)
- name = ERR_PTR(-ENOMEM);
- else
- sprintf(name, "%d/task/%d", tgid, pid);
- }
- nd_set_link(nd, name);
- return NULL;
+ char *name;
+
+ if (!pid)
+ return ERR_PTR(-ENOENT);
+ name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
+ if (!name)
+ return ERR_PTR(-ENOMEM);
+ sprintf(name, "%d/task/%d", tgid, pid);
+ return *cookie = name;
}

static const struct inode_operations proc_thread_self_inode_operations = {
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f4cd720..26c4dcb 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -41,7 +41,6 @@

#include <linux/capability.h>
#include <linux/xattr.h>
-#include <linux/namei.h>
#include <linux/posix_acl.h>
#include <linux/security.h>
#include <linux/fiemap.h>
@@ -414,9 +413,10 @@ xfs_vn_rename(
* we need to be very careful about how much stack we use.
* uio is kmalloced for this reason...
*/
-STATIC void *
+STATIC const char *
xfs_vn_follow_link(
struct dentry *dentry,
+ void **cookie,
struct nameidata *nd)
{
char *link;
@@ -430,14 +430,12 @@ xfs_vn_follow_link(
if (unlikely(error))
goto out_kfree;

- nd_set_link(nd, link);
- return NULL;
+ return *cookie = link;

out_kfree:
kfree(link);
out_err:
- nd_set_link(nd, ERR_PTR(error));
- return NULL;
+ return ERR_PTR(error);
}

STATIC int
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0ac758f..9ab9341 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1608,12 +1608,12 @@ struct file_operations {

struct inode_operations {
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
+ const char * (*follow_link) (struct dentry *, void **, struct nameidata *);
int (*permission) (struct inode *, int);
struct posix_acl * (*get_acl)(struct inode *, int);

int (*readlink) (struct dentry *, char __user *,int);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ void (*put_link) (struct dentry *, void *);

int (*create) (struct inode *,struct dentry *, umode_t, bool);
int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2705,13 +2705,13 @@ extern const struct file_operations generic_ro_fops;

extern int readlink_copy(char __user *, int, const char *);
extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *, struct nameidata *);
-extern void page_put_link(struct dentry *, struct nameidata *, void *);
+extern const char *page_follow_link_light(struct dentry *, void **, struct nameidata *);
+extern void page_put_link(struct dentry *, void *);
extern int __page_symlink(struct inode *inode, const char *symname, int len,
int nofs);
extern int page_symlink(struct inode *inode, const char *symname, int len);
extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, struct nameidata *, void *);
+extern void kfree_put_link(struct dentry *, void *);
extern int generic_readlink(struct dentry *, char __user *, int);
extern void generic_fillattr(struct inode *, struct kstat *);
int vfs_getattr_nosec(struct path *path, struct kstat *stat);
@@ -2722,7 +2722,7 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);
-void *simple_follow_link(struct dentry *, struct nameidata *);
+const char *simple_follow_link(struct dentry *, void **, struct nameidata *);
extern const struct inode_operations simple_symlink_inode_operations;

extern int iterate_dir(struct file *, struct dir_context *);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c899077..a5d5bed 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -71,8 +71,6 @@ extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);

extern void nd_jump_link(struct nameidata *nd, struct path *path);
-extern void nd_set_link(struct nameidata *nd, char *path);
-extern char *nd_get_link(struct nameidata *nd);

static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
diff --git a/mm/shmem.c b/mm/shmem.c
index 7f6e2f8..d1693dc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2475,24 +2475,23 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return 0;
}

-static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
+static const char *shmem_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
{
struct page *page = NULL;
int error = shmem_getpage(d_inode(dentry), 0, &page, SGP_READ, NULL);
- nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
- if (page)
- unlock_page(page);
- return page;
+ if (error)
+ return ERR_PTR(error);
+ unlock_page(page);
+ *cookie = page;
+ return kmap(page);
}

-static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void shmem_put_link(struct dentry *dentry, void *cookie)
{
- if (!IS_ERR(nd_get_link(nd))) {
- struct page *page = cookie;
- kunmap(page);
- mark_page_accessed(page);
- page_cache_release(page);
- }
+ struct page *page = cookie;
+ kunmap(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
}

#ifdef CONFIG_TMPFS_XATTR
--
2.1.4

2015-05-11 18:06:59

by Al Viro

[permalink] [raw]
Subject: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks


> There are several things in that queue; most of it is fs/namei.c
> rewrite and closer to the end we get
> * non-recursive link_path_walk()
> * the only symlink-related limitation is that there should be no
> more than 40 symlinks traversals, no matter how nested they are.
> * stack footprint independent from the nesting and about on par with
> mainline in "no symlinks" case.
> * some work towards the symlink traversal without dropping from
> RCU mode; not complete, but much better starting point than the current
> mainline.
> * IMO more readable logics around the pathname resolution in
> general, but then I'd spent the last couple of weeks messing with that code,
> so my perception might be seriously warped.

... and in addition to that - symlink traversal without dropping out of
RCU mode. For now - only the fast symlinks, but extending that to symlinks
with non-trivial ->follow_link() is pretty straightforward.

> Most of that stuff is mine, with exception of 3 commits from
> Neil's series (with some modifications). I've marked those with [N] in
> the list below.

6 patches from Neil's series now, plus one from David Howells (marked [D]).

> It really needs review and more testing; it *does* pass xfstests,
> LTP and local tests with no regressions, but at the very least it'll need
> careful profiling, as well as more eyes to read it through. The whole
> thing is based at -rc1. Individual patches are in followups; those who
> prefer to access it via git can fetch it at
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git link_path_walk

Same branch, rebased on -rc3. Still passes all tests with no regressions.
Balance is at about -0.2KLoC...

Please, read and comment.

Series layout is pretty much unchanged - the first 76 commits (out of 110 ;-/)
are fairly close to what had been posted in the previous series.

* assorted preliminary patches:

[1/110] 9p: don't bother with 4K allocation for 24-byte local array...
[2/110] 9p: don't bother with __getname() in ->follow_link()
[3/110][N] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
[4/110] ext4: split inode_operations for encrypted symlinks off the rest

* getting rid of boilerplate for inline symlinks

[5/110] libfs: simple_follow_link()
[6/110] ext2: use simple_follow_link()
[7/110] befs: switch to simple_follow_link()
[8/110] ext3: switch to simple_follow_link()
[9/110] ext4: switch to simple_follow_link()
[10/110] jffs2: switch to simple_follow_link()
[11/110] shmem: switch to simple_follow_link()
[12/110] debugfs: switch to simple_follow_link()
[13/110] ufs: switch to simple_follow_link()
[14/110] ubifs: switch to simple_follow_link()
[15/110] sysv: switch to simple_follow_link()
[16/110] jfs: switch to simple_follow_link()
[17/110] freevxfs: switch to simple_follow_link()
[18/110] exofs: switch to {simple,page}_symlink_inode_operations
[19/110] ceph: switch to simple_follow_link()

* more assorted preparations:
[20/110] logfs: fix a pagecache leak for symlinks
[21/110][N] SECURITY: remove nameidata arg from inode_follow_link.
[22/110] uninline walk_component()

* getting trailing symlinks handling in do_last() into saner shape:
[23/110] namei: take O_NOFOLLOW treatment into do_last()
[24/110] do_last: kill symlink_ok
[25/110] do_last: regularize the logics around following symlinks

* getting rid of nameidata on stack during ->rmdir()/->unlink() and double
nameidata during ->rename():
[26/110] namei: get rid of lookup_hash()
[27/110] name: shift nameidata down into user_path_walk()

* single instance of nameidata across the adjacent path_mountpoint() calls:
[28/110] namei: lift nameidata into filename_mountpoint()

At that point preparations end. The next one changes the calling conventions
for ->follow_link/->put_link. We still are passing nameidata to
->follow_link(), but almost all instances do not use it anymore. Moreover,
nd->saved_names[] is gone and so's nameidata in generic_readlink().
That one has the biggest footprint outside of fs/namei.c - it's a flagday
change.

[29/110] new ->follow_link() and ->put_link() calling conventions

* some preparations for link_path_walk() massage:
[30/110] namei.c: separate the parts of follow_link() that find the link body
[31/110] namei: don't bother with ->follow_link() if ->i_link is set
[32/110] namei: introduce nameidata->link
[33/110] do_last: move path there from caller's stack frame

* link_path_walk()/follow_link() rewrite. It's done in small steps,
getting the damn thing into form that would allow to get rid of recursion:

[34/110] namei: expand nested_symlink() in its only caller
[35/110] namei: expand the call of follow_link() in link_path_walk()
[36/110] namei: move the calls of may_follow_link() into follow_link()
[37/110] namei: rename follow_link to trailing_symlink, move it down
[38/110] link_path_walk: handle get_link() returning ERR_PTR() immediately
[39/110] link_path_walk: don't bother with walk_component() after jumping link
[40/110] link_path_walk: turn inner loop into explicit goto
[41/110] link_path_walk: massage a bit more
[42/110] link_path_walk: get rid of duplication
[43/110] link_path_walk: final preparations to killing recursion

* dumb and staightforward "put all variables we needed to be separate in
each frame into an array of structs" killing of recursion. Array is still
on stack frame of link_path_walk(), we are still with the same depth limits,
etc. All of that will change shortly afterwards.
[44/110] link_path_walk: kill the recursion

* post-operation cleanup:
[45/110] link_path_walk: split "return from recursive call" path
[46/110] link_path_walk: cleanup - turn goto start; into continue;

* array moves into nameidata, with extra (0th) element used instead of
link/cookie pairs of local variables we used to have in trailing symlink
loops:
[47/110] namei: move link/cookie pairs into nameidata

* more post-op cleanup - arguments of several functions are always the
fields of the same array element or pointers to such:
[48/110] namei: trim redundant arguments of trailing_symlink()
[49/110] namei: trim redundant arguments of fs/namei.c:put_link()
[50/110] namei: trim the arguments of get_link()

* trim the array in nameidata to a couple of elements, use it until we
need more and allocate a big (41-element) array when needed. Result:
no more nesting limitations.
[51/110] namei: remove restrictions on nesting depth

* the use of array is seriously counterintuitive - for everything except
the last component we are _not_ using the first element of array; moreover,
way too many places are manipulating nd->depth. I've tried to sanitize
all of that in one patch, but after several failures I ended up splitting
the massage into smaller steps. The composition of those has ended up
fairly small, but convincing yourself that it's correct will mean reporoducing
this splitup ;-/
[52/110] link_path_walk: nd->depth massage, part 1
[53/110] link_path_walk: nd->depth massage, part 2
[54/110] link_path_walk: nd->depth massage, part 3
[55/110] link_path_walk: nd->depth massage, part 4
[56/110] trailing_symlink: nd->depth massage, part 5
[57/110] get_link: nd->depth massage, part 6
[58/110] trailing_symlink: nd->depth massage, part 7
[59/110] put_link: nd->depth massage, part 8
[60/110] link_path_walk: nd->depth massage, part 9
[61/110] link_path_walk: nd->depth massage, part 10
[62/110] link_path_walk: end of nd->depth massage
[63/110] namei: we never need more than MAXSYMLINKS entries in nd->stack

* getting more regular rules for calling terminate_walk()/put_link().
The reason for that part is (besides getting simpler logics in the end) that
with RCU-symlink series we'll really need to have terminate_walk() trigger
put_link() on everything we have on stack:

[64/110] namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into
callers
[65/110] lift terminate_walk() into callers of walk_component()
[66/110] namei: lift (open-coded) terminate_walk() into callers of get_link()
[67/110] namei: take put_link() into {lookup,mountpoint,do}_last()
[68/110] namei: have terminate_walk() do put_link() on everything left
[69/110] link_path_walk: move the OK: inside the loop

* more put_link() work - basically, we are consolidating the "we decide
that we'd found a symlink that needs to be followed" logics. Again,
that's one of the areas RCU-symlink will need to modify:

[70/110] namei: new calling conventions for walk_component()
[71/110] namei: make should_follow_link() store the link in nd->link
[72/110] namei: move link count check and stack allocation into pick_link()

Next 4 commits are about not passing nameidata to ->follow_link() -
it's the same situation as with pt_regs and IRQ handlers and the same kind
of solution. Only 2 instances out of 26 use nameidata at all, and only
to pass it to nd_jump_link(); seeing that we already have the "begin
using this nameidata instance/end using it" pairs bracketing the areas
where thread works with given instance (we need that for sane freeing
of on-demand allocated nd->stack), the solution is obvious...

[73/110] lustre: rip the private symlink nesting limit out
[74/110][N] VFS: replace {, total_}link_count in task_struct with pointer to
nameidata
[75/110] namei: simplify the callers of follow_managed()
[76/110] don't pass nameidata to ->follow_link()

That's where the parts common with v2 end. The rest is new.
get_link() reorganization - make it handle the jump-to-root and return
the remaining part of link to follow, with "/" treated as we do treat
pure jumps (i.e. return NULL). Adjust callers (sucking a lot of duplication
into get_link()).

[77/110] namei: simplify failure exits in get_link()
[78/110] namei: simpler treatment of symlinks with nothing other that / in the body
[79/110] namei: take the treatment of absolute symlinks to get_link()

convert complete_walk() to use of unlazy_walk()/terminate_walk():
[80/110] namei: fold put_link() into the failure case of complete_walk()

the next bunch takes pushing the link on stack from get_link() to pick_link():

[81/110] namei: move bumping the refcount of link->mnt into pick_link()
[82/110] may_follow_link(): trim arguments
[83/110] namei: kill nd->link
[84/110] namei: take increment of nd->depth into pick_link()

Lifting terminate_walk() and link_path_walk() calls up, reorganization and
cleanup of top-level loops:

[85/110] namei: may_follow_link() - lift terminate_walk() on failures into caller
[86/110] namei: split off filename_lookupat() with LOOKUP_PARENT
[87/110] namei: get rid of nameidata->base
[88/110] namei: path_init() calling conventions change
[89/110] namei: lift link_path_walk() call out of trailing_symlink()
[90/110] namei: lift terminate_walk() all the way up
[91/110] link_path_walk: use explicit returns for failure exits

Preparations for RCU work: we need to stop manging nd->seq when finding
a symlink and we'll need to keep inode as well as vfsmount/dentry in
the stack:

[92/110] namei: explicitly pass seq number to unlazy_walk() when dentry != NULL
[93/110] namei: don't mangle nd->seq in lookup_fast()
[94/110] namei: store inode in nd->stack[]
[95/110][D] VFS: Handle lower layer dentry/inode in pathwalk
[96/110] namei: pick_link() callers already have inode

More prep work: RCU-safe Linux S&M interaction (that one's entirely Neil's)
[97/110][N] security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags()
[98/110][N] security: make inode_follow_link RCU-walk aware

minor ->put_link() API change:
[99/110] switch ->put_link() from dentry to inode
[100/110] new helper: free_page_put_link()

More prep work: put_link() and may_follow_link() made RCU-safe
[101/110] namei: make put_link() RCU-safe
[102/110] namei: make may_follow_link() safe in RCU mode

Legitimizing nd->stack[] on unlazy_walk():
[103/110] new helper: __legitimize_mnt()
[104/110] namei: store seq numbers in nd->stack[]
[105/110] namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

... and payoff - shift unlazying down to get_link() and narrow it to the
"we don't have ->i_link and have to call the actual method" case:
[106/110] namei: don't unlazy until get_link()
[107/110][N] VFS/namei: make the use of touch_atime() in get_link() RCU-safe.
[108/110] enable passing fast relative symlinks without dropping out of RCU mode
[109/110] namei: handle absolute symlinks without dropping out of RCU mode

documenting the API changes so far:
[110/110] update Documentation/filesystems/ regarding the follow_link/put_link changes

2015-05-11 18:36:25

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 030/110] namei.c: separate the parts of follow_link() that find the link body

From: Al Viro <[email protected]>

Split a piece of fs/namei.c:follow_link() that does obtaining the link
body into a separate function. follow_link() itself is converted to
calling get_link() and then doing the body traversal (if any).

The next step will expand follow_link() call in link_path_walk()
and this helps to keep the size down...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 65 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index aeca448..a1ba556 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -836,21 +836,22 @@ static int may_linkat(struct path *link)
return -EPERM;
}

-static __always_inline int
-follow_link(struct path *link, struct nameidata *nd, void **p)
+static __always_inline const char *
+get_link(struct path *link, struct nameidata *nd, void **p)
{
struct dentry *dentry = link->dentry;
+ struct inode *inode = dentry->d_inode;
int error;
- const char *s;
+ const char *res;

BUG_ON(nd->flags & LOOKUP_RCU);

if (link->mnt == nd->path.mnt)
mntget(link->mnt);

- error = -ELOOP;
+ res = ERR_PTR(-ELOOP);
if (unlikely(current->total_link_count >= 40))
- goto out_put_nd_path;
+ goto out;

cond_resched();
current->total_link_count++;
@@ -858,37 +859,43 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
touch_atime(link);

error = security_inode_follow_link(dentry);
+ res = ERR_PTR(error);
if (error)
- goto out_put_nd_path;
+ goto out;

nd->last_type = LAST_BIND;
*p = NULL;
- s = dentry->d_inode->i_op->follow_link(dentry, p, nd);
- error = PTR_ERR(s);
- if (IS_ERR(s))
- goto out_put_nd_path;
-
- error = 0;
- if (s) {
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- error = link_path_walk(s, nd);
- if (unlikely(error))
- put_link(nd, link, *p);
+ res = inode->i_op->follow_link(dentry, p, nd);
+ if (IS_ERR(res)) {
+out:
+ path_put(&nd->path);
+ path_put(link);
}
+ return res;
+}

- return error;
+static __always_inline int
+follow_link(struct path *link, struct nameidata *nd, void **p)
+{
+ const char *s = get_link(link, nd, p);
+ int error;

-out_put_nd_path:
- path_put(&nd->path);
- path_put(link);
+ if (unlikely(IS_ERR(s)))
+ return PTR_ERR(s);
+ if (unlikely(!s))
+ return 0;
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
+ }
+ nd->inode = nd->path.dentry->d_inode;
+ error = link_path_walk(s, nd);
+ if (unlikely(error))
+ put_link(nd, link, *p);
return error;
}

--
2.1.4

2015-05-11 18:36:18

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 031/110] namei: don't bother with ->follow_link() if ->i_link is set

From: Al Viro <[email protected]>

with new calling conventions it's trivial

Signed-off-by: Al Viro <[email protected]>

Conflicts:
fs/namei.c
---
fs/namei.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a1ba556..2ffb4af 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -865,11 +865,14 @@ get_link(struct path *link, struct nameidata *nd, void **p)

nd->last_type = LAST_BIND;
*p = NULL;
- res = inode->i_op->follow_link(dentry, p, nd);
- if (IS_ERR(res)) {
+ res = inode->i_link;
+ if (!res) {
+ res = inode->i_op->follow_link(dentry, p, nd);
+ if (IS_ERR(res)) {
out:
- path_put(&nd->path);
- path_put(link);
+ path_put(&nd->path);
+ path_put(link);
+ }
}
return res;
}
@@ -4418,11 +4421,14 @@ EXPORT_SYMBOL(readlink_copy);
int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
{
void *cookie;
- const char *link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
+ const char *link = dentry->d_inode->i_link;
int res;

- if (IS_ERR(link))
- return PTR_ERR(link);
+ if (!link) {
+ link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
res = readlink_copy(buffer, buflen, link);
if (cookie && dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, cookie);
--
2.1.4

2015-05-11 18:36:14

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 032/110] namei: introduce nameidata->link

From: Al Viro <[email protected]>

shares space with nameidata->next, walk_component() et.al. store
the struct path of symlink instead of returning it into a variable
passed by caller.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 62 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2ffb4af..06c3e4a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -494,7 +494,10 @@ EXPORT_SYMBOL(path_put);

struct nameidata {
struct path path;
- struct qstr last;
+ union {
+ struct qstr last;
+ struct path link;
+ };
struct path root;
struct inode *inode; /* path.dentry.d_inode */
unsigned int flags;
@@ -1559,8 +1562,9 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
return unlikely(d_is_symlink(dentry)) ? follow : 0;
}

-static int walk_component(struct nameidata *nd, struct path *path, int follow)
+static int walk_component(struct nameidata *nd, int follow)
{
+ struct path path;
struct inode *inode;
int err;
/*
@@ -1570,38 +1574,39 @@ static int walk_component(struct nameidata *nd, struct path *path, int follow)
*/
if (unlikely(nd->last_type != LAST_NORM))
return handle_dots(nd, nd->last_type);
- err = lookup_fast(nd, path, &inode);
+ err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
goto out_err;

- err = lookup_slow(nd, path);
+ err = lookup_slow(nd, &path);
if (err < 0)
goto out_err;

- inode = path->dentry->d_inode;
+ inode = path.dentry->d_inode;
err = -ENOENT;
- if (d_is_negative(path->dentry))
+ if (d_is_negative(path.dentry))
goto out_path_put;
}

- if (should_follow_link(path->dentry, follow)) {
+ if (should_follow_link(path.dentry, follow)) {
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path->mnt ||
- unlazy_walk(nd, path->dentry))) {
+ if (unlikely(nd->path.mnt != path.mnt ||
+ unlazy_walk(nd, path.dentry))) {
err = -ECHILD;
goto out_err;
}
}
- BUG_ON(inode != path->dentry->d_inode);
+ BUG_ON(inode != path.dentry->d_inode);
+ nd->link = path;
return 1;
}
- path_to_nameidata(path, nd);
+ path_to_nameidata(&path, nd);
nd->inode = inode;
return 0;

out_path_put:
- path_to_nameidata(path, nd);
+ path_to_nameidata(&path, nd);
out_err:
terminate_walk(nd);
return err;
@@ -1614,12 +1619,12 @@ out_err:
* Without that kind of total limit, nasty chains of consecutive
* symlinks can cause almost arbitrarily long lookups.
*/
-static inline int nested_symlink(struct path *path, struct nameidata *nd)
+static inline int nested_symlink(struct nameidata *nd)
{
int res;

if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
- path_put_conditional(path, nd);
+ path_put_conditional(&nd->link, nd);
path_put(&nd->path);
return -ELOOP;
}
@@ -1629,13 +1634,13 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
current->link_count++;

do {
- struct path link = *path;
+ struct path link = nd->link;
void *cookie;

res = follow_link(&link, nd, &cookie);
if (res)
break;
- res = walk_component(nd, path, LOOKUP_FOLLOW);
+ res = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
} while (res > 0);

@@ -1770,7 +1775,6 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct path next;
int err;

while (*name=='/')
@@ -1830,17 +1834,17 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!*name)
return 0;

- err = walk_component(nd, &next, LOOKUP_FOLLOW);
+ err = walk_component(nd, LOOKUP_FOLLOW);
if (err < 0)
return err;

if (err) {
- err = nested_symlink(&next, nd);
+ err = nested_symlink(nd);
if (err)
return err;
}
if (!d_can_lookup(nd->path.dentry)) {
- err = -ENOTDIR;
+ err = -ENOTDIR;
break;
}
}
@@ -1960,20 +1964,19 @@ static void path_cleanup(struct nameidata *nd)
fput(nd->base);
}

-static inline int lookup_last(struct nameidata *nd, struct path *path)
+static inline int lookup_last(struct nameidata *nd)
{
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- return walk_component(nd, path, nd->flags & LOOKUP_FOLLOW);
+ return walk_component(nd, nd->flags & LOOKUP_FOLLOW);
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
static int path_lookupat(int dfd, const struct filename *name,
unsigned int flags, struct nameidata *nd)
{
- struct path path;
int err;

/*
@@ -1992,10 +1995,10 @@ static int path_lookupat(int dfd, const struct filename *name,
*/
err = path_init(dfd, name, flags, nd);
if (!err && !(flags & LOOKUP_PARENT)) {
- err = lookup_last(nd, &path);
+ err = lookup_last(nd);
while (err > 0) {
void *cookie;
- struct path link = path;
+ struct path link = nd->link;
err = may_follow_link(&link, nd);
if (unlikely(err))
break;
@@ -2003,7 +2006,7 @@ static int path_lookupat(int dfd, const struct filename *name,
err = follow_link(&link, nd, &cookie);
if (err)
break;
- err = lookup_last(nd, &path);
+ err = lookup_last(nd);
put_link(nd, &link, cookie);
}
}
@@ -2312,8 +2315,10 @@ done:
}
path->dentry = dentry;
path->mnt = nd->path.mnt;
- if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW))
+ if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) {
+ nd->link = *path;
return 1;
+ }
mntget(path->mnt);
follow_mount(path);
error = 0;
@@ -3040,6 +3045,7 @@ finish_lookup:
}
}
BUG_ON(inode != path->dentry->d_inode);
+ nd->link = *path;
return 1;
}

@@ -3228,7 +3234,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,

error = do_last(nd, &path, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
- struct path link = path;
+ struct path link = nd->link;
void *cookie;
error = may_follow_link(&link, nd);
if (unlikely(error))
--
2.1.4

2015-05-11 18:35:58

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 033/110] do_last: move path there from caller's stack frame

From: Al Viro <[email protected]>

We used to need it to feed to follow_link(). No more...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 50 +++++++++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 06c3e4a..8582907 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2911,7 +2911,7 @@ out_dput:
/*
* Handle the last step of open()
*/
-static int do_last(struct nameidata *nd, struct path *path,
+static int do_last(struct nameidata *nd,
struct file *file, const struct open_flags *op,
int *opened, struct filename *name)
{
@@ -2922,6 +2922,7 @@ static int do_last(struct nameidata *nd, struct path *path,
int acc_mode = op->acc_mode;
struct inode *inode;
struct path save_parent = { .dentry = NULL, .mnt = NULL };
+ struct path path;
bool retried = false;
int error;

@@ -2939,7 +2940,7 @@ static int do_last(struct nameidata *nd, struct path *path,
if (nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
/* we _can_ be in RCU mode here */
- error = lookup_fast(nd, path, &inode);
+ error = lookup_fast(nd, &path, &inode);
if (likely(!error))
goto finish_lookup;

@@ -2977,7 +2978,7 @@ retry_lookup:
*/
}
mutex_lock(&dir->d_inode->i_mutex);
- error = lookup_open(nd, path, file, op, got_write, opened);
+ error = lookup_open(nd, &path, file, op, got_write, opened);
mutex_unlock(&dir->d_inode->i_mutex);

if (error <= 0) {
@@ -2997,15 +2998,15 @@ retry_lookup:
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
- path_to_nameidata(path, nd);
+ path_to_nameidata(&path, nd);
goto finish_open_created;
}

/*
* create/update audit record if it already exists.
*/
- if (d_is_positive(path->dentry))
- audit_inode(name, path->dentry, 0);
+ if (d_is_positive(path.dentry))
+ audit_inode(name, path.dentry, 0);

/*
* If atomic_open() acquired write access it is dropped now due to
@@ -3021,7 +3022,7 @@ retry_lookup:
if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
goto exit_dput;

- error = follow_managed(path, nd->flags);
+ error = follow_managed(&path, nd->flags);
if (error < 0)
goto exit_dput;

@@ -3029,38 +3030,38 @@ retry_lookup:
nd->flags |= LOOKUP_JUMPED;

BUG_ON(nd->flags & LOOKUP_RCU);
- inode = path->dentry->d_inode;
+ inode = path.dentry->d_inode;
error = -ENOENT;
- if (d_is_negative(path->dentry)) {
- path_to_nameidata(path, nd);
+ if (d_is_negative(path.dentry)) {
+ path_to_nameidata(&path, nd);
goto out;
}
finish_lookup:
- if (should_follow_link(path->dentry, nd->flags & LOOKUP_FOLLOW)) {
+ if (should_follow_link(path.dentry, nd->flags & LOOKUP_FOLLOW)) {
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path->mnt ||
- unlazy_walk(nd, path->dentry))) {
+ if (unlikely(nd->path.mnt != path.mnt ||
+ unlazy_walk(nd, path.dentry))) {
error = -ECHILD;
goto out;
}
}
- BUG_ON(inode != path->dentry->d_inode);
- nd->link = *path;
+ BUG_ON(inode != path.dentry->d_inode);
+ nd->link = path;
return 1;
}

- if (unlikely(d_is_symlink(path->dentry)) && !(open_flag & O_PATH)) {
- path_to_nameidata(path, nd);
+ if (unlikely(d_is_symlink(path.dentry)) && !(open_flag & O_PATH)) {
+ path_to_nameidata(&path, nd);
error = -ELOOP;
goto out;
}

- if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path->mnt) {
- path_to_nameidata(path, nd);
+ if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
+ path_to_nameidata(&path, nd);
} else {
save_parent.dentry = nd->path.dentry;
- save_parent.mnt = mntget(path->mnt);
- nd->path.dentry = path->dentry;
+ save_parent.mnt = mntget(path.mnt);
+ nd->path.dentry = path.dentry;

}
nd->inode = inode;
@@ -3122,7 +3123,7 @@ out:
return error;

exit_dput:
- path_put_conditional(path, nd);
+ path_put_conditional(&path, nd);
goto out;
exit_fput:
fput(file);
@@ -3213,7 +3214,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
struct nameidata *nd, const struct open_flags *op, int flags)
{
struct file *file;
- struct path path;
int opened = 0;
int error;

@@ -3232,7 +3232,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
goto out;

- error = do_last(nd, &path, file, op, &opened, pathname);
+ error = do_last(nd, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
struct path link = nd->link;
void *cookie;
@@ -3244,7 +3244,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
error = follow_link(&link, nd, &cookie);
if (unlikely(error))
break;
- error = do_last(nd, &path, file, op, &opened, pathname);
+ error = do_last(nd, file, op, &opened, pathname);
put_link(nd, &link, cookie);
}
out:
--
2.1.4

2015-05-11 18:36:29

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 034/110] namei: expand nested_symlink() in its only caller

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 61 +++++++++++++++++++++++--------------------------------------
1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8582907..7fcda91 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1613,43 +1613,6 @@ out_err:
}

/*
- * This limits recursive symlink follows to 8, while
- * limiting consecutive symlinks to 40.
- *
- * Without that kind of total limit, nasty chains of consecutive
- * symlinks can cause almost arbitrarily long lookups.
- */
-static inline int nested_symlink(struct nameidata *nd)
-{
- int res;
-
- if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
- path_put_conditional(&nd->link, nd);
- path_put(&nd->path);
- return -ELOOP;
- }
- BUG_ON(nd->depth >= MAX_NESTED_LINKS);
-
- nd->depth++;
- current->link_count++;
-
- do {
- struct path link = nd->link;
- void *cookie;
-
- res = follow_link(&link, nd, &cookie);
- if (res)
- break;
- res = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &link, cookie);
- } while (res > 0);
-
- current->link_count--;
- nd->depth--;
- return res;
-}
-
-/*
* We can do the critical dentry name comparison and hashing
* operations one word at a time, but we are limited to:
*
@@ -1839,7 +1802,29 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return err;

if (err) {
- err = nested_symlink(nd);
+ if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
+ path_put_conditional(&nd->link, nd);
+ path_put(&nd->path);
+ return -ELOOP;
+ }
+ BUG_ON(nd->depth >= MAX_NESTED_LINKS);
+
+ nd->depth++;
+ current->link_count++;
+
+ do {
+ struct path link = nd->link;
+ void *cookie;
+
+ err = follow_link(&link, nd, &cookie);
+ if (err)
+ break;
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd, &link, cookie);
+ } while (err > 0);
+
+ current->link_count--;
+ nd->depth--;
if (err)
return err;
}
--
2.1.4

2015-05-11 18:36:06

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 035/110] namei: expand the call of follow_link() in link_path_walk()

From: Al Viro <[email protected]>

... and strip __always_inline from follow_link() - remaining callers
don't need that.

Now link_path_walk() recursion is a direct one.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7fcda91..2354d4f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -880,8 +880,7 @@ out:
return res;
}

-static __always_inline int
-follow_link(struct path *link, struct nameidata *nd, void **p)
+static int follow_link(struct path *link, struct nameidata *nd, void **p)
{
const char *s = get_link(link, nd, p);
int error;
@@ -1815,10 +1814,29 @@ static int link_path_walk(const char *name, struct nameidata *nd)
do {
struct path link = nd->link;
void *cookie;
+ const char *s = get_link(&link, nd, &cookie);

- err = follow_link(&link, nd, &cookie);
- if (err)
+ if (unlikely(IS_ERR(s))) {
+ err = PTR_ERR(s);
break;
+ }
+ err = 0;
+ if (likely(s)) {
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
+ }
+ nd->inode = nd->path.dentry->d_inode;
+ err = link_path_walk(s, nd);
+ if (unlikely(err)) {
+ put_link(nd, &link, cookie);
+ break;
+ }
+ }
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
} while (err > 0);
--
2.1.4

2015-05-11 18:36:03

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 036/110] namei: move the calls of may_follow_link() into follow_link()

From: Al Viro <[email protected]>

All remaining callers of the former are preceded by the latter

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2354d4f..f7659ee 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -882,9 +882,12 @@ out:

static int follow_link(struct path *link, struct nameidata *nd, void **p)
{
- const char *s = get_link(link, nd, p);
- int error;
-
+ const char *s;
+ int error = may_follow_link(link, nd);
+ if (unlikely(error))
+ return error;
+ nd->flags |= LOOKUP_PARENT;
+ s = get_link(link, nd, p);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
if (unlikely(!s))
@@ -2002,10 +2005,6 @@ static int path_lookupat(int dfd, const struct filename *name,
while (err > 0) {
void *cookie;
struct path link = nd->link;
- err = may_follow_link(&link, nd);
- if (unlikely(err))
- break;
- nd->flags |= LOOKUP_PARENT;
err = follow_link(&link, nd, &cookie);
if (err)
break;
@@ -2352,10 +2351,6 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
while (err > 0) {
void *cookie;
struct path link = *path;
- err = may_follow_link(&link, nd);
- if (unlikely(err))
- break;
- nd->flags |= LOOKUP_PARENT;
err = follow_link(&link, nd, &cookie);
if (err)
break;
@@ -3239,10 +3234,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
while (unlikely(error > 0)) { /* trailing symlink */
struct path link = nd->link;
void *cookie;
- error = may_follow_link(&link, nd);
- if (unlikely(error))
- break;
- nd->flags |= LOOKUP_PARENT;
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
error = follow_link(&link, nd, &cookie);
if (unlikely(error))
--
2.1.4

2015-05-11 18:30:46

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 037/110] namei: rename follow_link to trailing_symlink, move it down

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 62 ++++++++++++++++++++++++++++++--------------------------------
1 file changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f7659ee..d729ef7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -668,8 +668,6 @@ static __always_inline void set_root(struct nameidata *nd)
get_fs_root(current->fs, &nd->root);
}

-static int link_path_walk(const char *, struct nameidata *);
-
static __always_inline unsigned set_root_rcu(struct nameidata *nd)
{
struct fs_struct *fs = current->fs;
@@ -880,33 +878,6 @@ out:
return res;
}

-static int follow_link(struct path *link, struct nameidata *nd, void **p)
-{
- const char *s;
- int error = may_follow_link(link, nd);
- if (unlikely(error))
- return error;
- nd->flags |= LOOKUP_PARENT;
- s = get_link(link, nd, p);
- if (unlikely(IS_ERR(s)))
- return PTR_ERR(s);
- if (unlikely(!s))
- return 0;
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- error = link_path_walk(s, nd);
- if (unlikely(error))
- put_link(nd, link, *p);
- return error;
-}
-
static int follow_up_rcu(struct path *path)
{
struct mount *mnt = real_mount(path->mnt);
@@ -1970,6 +1941,33 @@ static void path_cleanup(struct nameidata *nd)
fput(nd->base);
}

+static int trailing_symlink(struct path *link, struct nameidata *nd, void **p)
+{
+ const char *s;
+ int error = may_follow_link(link, nd);
+ if (unlikely(error))
+ return error;
+ nd->flags |= LOOKUP_PARENT;
+ s = get_link(link, nd, p);
+ if (unlikely(IS_ERR(s)))
+ return PTR_ERR(s);
+ if (unlikely(!s))
+ return 0;
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
+ }
+ nd->inode = nd->path.dentry->d_inode;
+ error = link_path_walk(s, nd);
+ if (unlikely(error))
+ put_link(nd, link, *p);
+ return error;
+}
+
static inline int lookup_last(struct nameidata *nd)
{
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
@@ -2005,7 +2003,7 @@ static int path_lookupat(int dfd, const struct filename *name,
while (err > 0) {
void *cookie;
struct path link = nd->link;
- err = follow_link(&link, nd, &cookie);
+ err = trailing_symlink(&link, nd, &cookie);
if (err)
break;
err = lookup_last(nd);
@@ -2351,7 +2349,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
while (err > 0) {
void *cookie;
struct path link = *path;
- err = follow_link(&link, nd, &cookie);
+ err = trailing_symlink(&link, nd, &cookie);
if (err)
break;
err = mountpoint_last(nd, path);
@@ -3235,7 +3233,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
struct path link = nd->link;
void *cookie;
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- error = follow_link(&link, nd, &cookie);
+ error = trailing_symlink(&link, nd, &cookie);
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
--
2.1.4

2015-05-11 18:36:10

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 038/110] link_path_walk: handle get_link() returning ERR_PTR() immediately

From: Al Viro <[email protected]>

If we get ERR_PTR() from get_link(), we are guaranteed to get err != 0
when we break out of do-while, so we are going to hit if (err) return err;
shortly after it. Pull that into the if (IS_ERR(s)) body.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index d729ef7..9937470 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1792,7 +1792,9 @@ static int link_path_walk(const char *name, struct nameidata *nd)

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- break;
+ current->link_count--;
+ nd->depth--;
+ return err;
}
err = 0;
if (likely(s)) {
--
2.1.4

2015-05-11 18:33:13

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 039/110] link_path_walk: don't bother with walk_component() after jumping link

From: Al Viro <[email protected]>

... it does nothing if nd->last_type is LAST_BIND.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9937470..ee083f9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1797,7 +1797,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return err;
}
err = 0;
- if (likely(s)) {
+ if (unlikely(!s)) {
+ /* jumped */
+ put_link(nd, &link, cookie);
+ break;
+ } else {
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -1812,9 +1816,9 @@ static int link_path_walk(const char *name, struct nameidata *nd)
put_link(nd, &link, cookie);
break;
}
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd, &link, cookie);
}
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &link, cookie);
} while (err > 0);

current->link_count--;
--
2.1.4

2015-05-11 18:33:03

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 040/110] link_path_walk: turn inner loop into explicit goto

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 61 ++++++++++++++++++++++++++++++++-----------------------------
1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ee083f9..6a0dd07 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1775,6 +1775,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return err;

if (err) {
+ struct path link;
+ void *cookie;
+ const char *s;
+
if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
path_put_conditional(&nd->link, nd);
path_put(&nd->path);
@@ -1785,41 +1789,40 @@ static int link_path_walk(const char *name, struct nameidata *nd)
nd->depth++;
current->link_count++;

- do {
- struct path link = nd->link;
- void *cookie;
- const char *s = get_link(&link, nd, &cookie);
-
- if (unlikely(IS_ERR(s))) {
- err = PTR_ERR(s);
- current->link_count--;
- nd->depth--;
- return err;
+loop: /* will be gone very soon */
+ link = nd->link;
+ s = get_link(&link, nd, &cookie);
+
+ if (unlikely(IS_ERR(s))) {
+ err = PTR_ERR(s);
+ current->link_count--;
+ nd->depth--;
+ return err;
+ }
+ err = 0;
+ if (unlikely(!s)) {
+ /* jumped */
+ put_link(nd, &link, cookie);
+ } else {
+ if (*s == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->flags |= LOOKUP_JUMPED;
}
- err = 0;
- if (unlikely(!s)) {
- /* jumped */
+ nd->inode = nd->path.dentry->d_inode;
+ err = link_path_walk(s, nd);
+ if (unlikely(err)) {
put_link(nd, &link, cookie);
- break;
} else {
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- err = link_path_walk(s, nd);
- if (unlikely(err)) {
- put_link(nd, &link, cookie);
- break;
- }
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
+ if (err > 0)
+ goto loop;
}
- } while (err > 0);
+ }

current->link_count--;
nd->depth--;
--
2.1.4

2015-05-11 18:33:10

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 041/110] link_path_walk: massage a bit more

From: Al Viro <[email protected]>

Pull the block after the if-else in the end of what used to be do-while
body into all branches there. We are almost done with the massage...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6a0dd07..b7ba718 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1803,6 +1803,8 @@ loop: /* will be gone very soon */
if (unlikely(!s)) {
/* jumped */
put_link(nd, &link, cookie);
+ current->link_count--;
+ nd->depth--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1816,18 +1818,23 @@ loop: /* will be gone very soon */
err = link_path_walk(s, nd);
if (unlikely(err)) {
put_link(nd, &link, cookie);
+ current->link_count--;
+ nd->depth--;
+ return err;
} else {
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
- if (err > 0)
+ current->link_count--;
+ nd->depth--;
+ if (err < 0)
+ return err;
+ if (err > 0) {
+ current->link_count++;
+ nd->depth++;
goto loop;
+ }
}
}
-
- current->link_count--;
- nd->depth--;
- if (err)
- return err;
}
if (!d_can_lookup(nd->path.dentry)) {
err = -ENOTDIR;
--
2.1.4

2015-05-11 18:33:06

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 042/110] link_path_walk: get rid of duplication

From: Al Viro <[email protected]>

What we do after the second walk_component() + put_link() + depth
decrement in there is exactly equivalent to what's done right
after the first walk_component(). Easy to verify and not at all
surprising, seeing that there we have just walked the last
component of nested symlink.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b7ba718..9f45d33 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1771,6 +1771,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
return 0;

err = walk_component(nd, LOOKUP_FOLLOW);
+Walked:
if (err < 0)
return err;

@@ -1789,7 +1790,6 @@ static int link_path_walk(const char *name, struct nameidata *nd)
nd->depth++;
current->link_count++;

-loop: /* will be gone very soon */
link = nd->link;
s = get_link(&link, nd, &cookie);

@@ -1826,13 +1826,7 @@ loop: /* will be gone very soon */
put_link(nd, &link, cookie);
current->link_count--;
nd->depth--;
- if (err < 0)
- return err;
- if (err > 0) {
- current->link_count++;
- nd->depth++;
- goto loop;
- }
+ goto Walked;
}
}
}
--
2.1.4

2015-05-11 18:32:57

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 043/110] link_path_walk: final preparations to killing recursion

From: Al Viro <[email protected]>

reduce the number of returns in there - turn all places
where it returns zero into goto OK and places where it
returns non-zero into goto Err. The only non-trivial
detail is that all breaks in the loop are guaranteed
to be with non-zero err.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9f45d33..b469ce2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1716,7 +1716,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
while (*name=='/')
name++;
if (!*name)
- return 0;
+ goto OK;

/* At this point we know we have a real path component. */
for(;;) {
@@ -1759,7 +1759,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)

name += hashlen_len(hash_len);
if (!*name)
- return 0;
+ goto OK;
/*
* If it wasn't NUL, we know it was '/'. Skip that
* slash, and continue until no more slashes.
@@ -1768,12 +1768,12 @@ static int link_path_walk(const char *name, struct nameidata *nd)
name++;
} while (unlikely(*name == '/'));
if (!*name)
- return 0;
+ goto OK;

err = walk_component(nd, LOOKUP_FOLLOW);
Walked:
if (err < 0)
- return err;
+ goto Err;

if (err) {
struct path link;
@@ -1783,7 +1783,8 @@ Walked:
if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
path_put_conditional(&nd->link, nd);
path_put(&nd->path);
- return -ELOOP;
+ err = -ELOOP;
+ goto Err;
}
BUG_ON(nd->depth >= MAX_NESTED_LINKS);

@@ -1797,7 +1798,7 @@ Walked:
err = PTR_ERR(s);
current->link_count--;
nd->depth--;
- return err;
+ goto Err;
}
err = 0;
if (unlikely(!s)) {
@@ -1820,7 +1821,7 @@ Walked:
put_link(nd, &link, cookie);
current->link_count--;
nd->depth--;
- return err;
+ goto Err;
} else {
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd, &link, cookie);
@@ -1836,7 +1837,10 @@ Walked:
}
}
terminate_walk(nd);
+Err:
return err;
+OK:
+ return 0;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-11 18:32:54

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 044/110] link_path_walk: kill the recursion

From: Al Viro <[email protected]>

absolutely straightforward now - the only variables we need to preserve
across the recursive call are name, link and cookie, and recursion depth
is limited (and can is equal to nd->depth). So arrange an array of
triples to hold instances of those and be done with that.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 40 +++++++++++++++++++++++++++++-----------
1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b469ce2..9844bb2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1711,8 +1711,14 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
+ struct saved {
+ struct path link;
+ void *cookie;
+ const char *name;
+ } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
int err;
-
+
+start:
while (*name=='/')
name++;
if (!*name)
@@ -1776,8 +1782,6 @@ Walked:
goto Err;

if (err) {
- struct path link;
- void *cookie;
const char *s;

if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
@@ -1790,22 +1794,25 @@ Walked:

nd->depth++;
current->link_count++;
+ last++;

- link = nd->link;
- s = get_link(&link, nd, &cookie);
+ last->link = nd->link;
+ s = get_link(&last->link, nd, &last->cookie);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
current->link_count--;
nd->depth--;
+ last--;
goto Err;
}
err = 0;
if (unlikely(!s)) {
/* jumped */
- put_link(nd, &link, cookie);
+ put_link(nd, &last->link, last->cookie);
current->link_count--;
nd->depth--;
+ last--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1816,17 +1823,24 @@ Walked:
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
- err = link_path_walk(s, nd);
+ last->name = name;
+ name = s;
+ goto start;
+
+back:
+ name = last->name;
if (unlikely(err)) {
- put_link(nd, &link, cookie);
+ put_link(nd, &last->link, last->cookie);
current->link_count--;
nd->depth--;
+ last--;
goto Err;
} else {
err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &link, cookie);
+ put_link(nd, &last->link, last->cookie);
current->link_count--;
nd->depth--;
+ last--;
goto Walked;
}
}
@@ -1838,9 +1852,13 @@ Walked:
}
terminate_walk(nd);
Err:
- return err;
+ if (likely(!nd->depth))
+ return err;
+ goto back;
OK:
- return 0;
+ if (likely(!nd->depth))
+ return 0;
+ goto back;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-11 18:32:46

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 045/110] link_path_walk: split "return from recursive call" path

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 40 +++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9844bb2..5b0edd3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1826,23 +1826,6 @@ Walked:
last->name = name;
name = s;
goto start;
-
-back:
- name = last->name;
- if (unlikely(err)) {
- put_link(nd, &last->link, last->cookie);
- current->link_count--;
- nd->depth--;
- last--;
- goto Err;
- } else {
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &last->link, last->cookie);
- current->link_count--;
- nd->depth--;
- last--;
- goto Walked;
- }
}
}
if (!d_can_lookup(nd->path.dentry)) {
@@ -1852,13 +1835,24 @@ back:
}
terminate_walk(nd);
Err:
- if (likely(!nd->depth))
- return err;
- goto back;
+ while (unlikely(nd->depth)) {
+ put_link(nd, &last->link, last->cookie);
+ current->link_count--;
+ nd->depth--;
+ last--;
+ }
+ return err;
OK:
- if (likely(!nd->depth))
- return 0;
- goto back;
+ if (unlikely(nd->depth)) {
+ name = last->name;
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd, &last->link, last->cookie);
+ current->link_count--;
+ nd->depth--;
+ last--;
+ goto Walked;
+ }
+ return 0;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-11 18:32:50

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 046/110] link_path_walk: cleanup - turn goto start; into continue;

From: Al Viro <[email protected]>

Deal with skipping leading slashes before what used to be the
recursive call. That way we can get rid of that goto completely.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5b0edd3..c105107 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1718,11 +1718,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
} stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
int err;

-start:
while (*name=='/')
name++;
if (!*name)
- goto OK;
+ return 0;

/* At this point we know we have a real path component. */
for(;;) {
@@ -1821,11 +1820,15 @@ Walked:
nd->path = nd->root;
path_get(&nd->root);
nd->flags |= LOOKUP_JUMPED;
+ while (unlikely(*++s == '/'))
+ ;
}
nd->inode = nd->path.dentry->d_inode;
last->name = name;
+ if (!*s)
+ goto OK;
name = s;
- goto start;
+ continue;
}
}
if (!d_can_lookup(nd->path.dentry)) {
--
2.1.4

2015-05-11 18:32:42

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 047/110] namei: move link/cookie pairs into nameidata

From: Al Viro <[email protected]>

Array of MAX_NESTED_LINKS + 1 elements put into nameidata;
what used to be a local array in link_path_walk() occupies
entries 1 .. MAX_NESTED_LINKS in it, link and cookie from
the trailing symlink handling loops - entry 0.

This is _not_ the final arrangement; just an easily verified
incremental step.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c105107..d4b238a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,11 @@ struct nameidata {
int last_type;
unsigned depth;
struct file *base;
+ struct saved {
+ struct path link;
+ void *cookie;
+ const char *name;
+ } stack[MAX_NESTED_LINKS + 1];
};

/*
@@ -1711,11 +1716,7 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct saved {
- struct path link;
- void *cookie;
- const char *name;
- } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
+ struct saved *last = nd->stack;
int err;

while (*name=='/')
@@ -2030,13 +2031,13 @@ static int path_lookupat(int dfd, const struct filename *name,
if (!err && !(flags & LOOKUP_PARENT)) {
err = lookup_last(nd);
while (err > 0) {
- void *cookie;
- struct path link = nd->link;
- err = trailing_symlink(&link, nd, &cookie);
+ nd->stack[0].link = nd->link;
+ err = trailing_symlink(&nd->stack[0].link,
+ nd, &nd->stack[0].cookie);
if (err)
break;
err = lookup_last(nd);
- put_link(nd, &link, cookie);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
}
}

@@ -2376,13 +2377,13 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,

err = mountpoint_last(nd, path);
while (err > 0) {
- void *cookie;
- struct path link = *path;
- err = trailing_symlink(&link, nd, &cookie);
+ nd->stack[0].link = nd->link;
+ err = trailing_symlink(&nd->stack[0].link,
+ nd, &nd->stack[0].cookie);
if (err)
break;
err = mountpoint_last(nd, path);
- put_link(nd, &link, cookie);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
}
out:
path_cleanup(nd);
@@ -3259,14 +3260,14 @@ static struct file *path_openat(int dfd, struct filename *pathname,

error = do_last(nd, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
- struct path link = nd->link;
- void *cookie;
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- error = trailing_symlink(&link, nd, &cookie);
+ nd->stack[0].link = nd->link;
+ error= trailing_symlink(&nd->stack[0].link,
+ nd, &nd->stack[0].cookie);
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
- put_link(nd, &link, cookie);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
}
out:
path_cleanup(nd);
--
2.1.4

2015-05-11 18:32:41

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 048/110] namei: trim redundant arguments of trailing_symlink()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d4b238a..8c4f2af 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1971,14 +1971,15 @@ static void path_cleanup(struct nameidata *nd)
fput(nd->base);
}

-static int trailing_symlink(struct path *link, struct nameidata *nd, void **p)
+static int trailing_symlink(struct nameidata *nd)
{
const char *s;
- int error = may_follow_link(link, nd);
+ int error = may_follow_link(&nd->link, nd);
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
- s = get_link(link, nd, p);
+ nd->stack[0].link = nd->link;
+ s = get_link(&nd->stack[0].link, nd, &nd->stack[0].cookie);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
if (unlikely(!s))
@@ -1994,7 +1995,7 @@ static int trailing_symlink(struct path *link, struct nameidata *nd, void **p)
nd->inode = nd->path.dentry->d_inode;
error = link_path_walk(s, nd);
if (unlikely(error))
- put_link(nd, link, *p);
+ put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
return error;
}

@@ -2031,9 +2032,7 @@ static int path_lookupat(int dfd, const struct filename *name,
if (!err && !(flags & LOOKUP_PARENT)) {
err = lookup_last(nd);
while (err > 0) {
- nd->stack[0].link = nd->link;
- err = trailing_symlink(&nd->stack[0].link,
- nd, &nd->stack[0].cookie);
+ err = trailing_symlink(nd);
if (err)
break;
err = lookup_last(nd);
@@ -2377,9 +2376,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,

err = mountpoint_last(nd, path);
while (err > 0) {
- nd->stack[0].link = nd->link;
- err = trailing_symlink(&nd->stack[0].link,
- nd, &nd->stack[0].cookie);
+ err = trailing_symlink(nd);
if (err)
break;
err = mountpoint_last(nd, path);
@@ -3261,9 +3258,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
error = do_last(nd, file, op, &opened, pathname);
while (unlikely(error > 0)) { /* trailing symlink */
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- nd->stack[0].link = nd->link;
- error= trailing_symlink(&nd->stack[0].link,
- nd, &nd->stack[0].cookie);
+ error = trailing_symlink(nd);
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
--
2.1.4

2015-05-11 18:30:44

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 049/110] namei: trim redundant arguments of fs/namei.c:put_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8c4f2af..1ac3217 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -718,12 +718,13 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
nd->flags |= LOOKUP_JUMPED;
}

-static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
+static inline void put_link(struct nameidata *nd)
{
- struct inode *inode = link->dentry->d_inode;
- if (cookie && inode->i_op->put_link)
- inode->i_op->put_link(link->dentry, cookie);
- path_put(link);
+ struct saved *last = nd->stack + nd->depth;
+ struct inode *inode = last->link.dentry->d_inode;
+ if (last->cookie && inode->i_op->put_link)
+ inode->i_op->put_link(last->link.dentry, last->cookie);
+ path_put(&last->link);
}

int sysctl_protected_symlinks __read_mostly = 0;
@@ -1809,7 +1810,7 @@ Walked:
err = 0;
if (unlikely(!s)) {
/* jumped */
- put_link(nd, &last->link, last->cookie);
+ put_link(nd);
current->link_count--;
nd->depth--;
last--;
@@ -1840,7 +1841,7 @@ Walked:
terminate_walk(nd);
Err:
while (unlikely(nd->depth)) {
- put_link(nd, &last->link, last->cookie);
+ put_link(nd);
current->link_count--;
nd->depth--;
last--;
@@ -1850,7 +1851,7 @@ OK:
if (unlikely(nd->depth)) {
name = last->name;
err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd, &last->link, last->cookie);
+ put_link(nd);
current->link_count--;
nd->depth--;
last--;
@@ -1995,7 +1996,7 @@ static int trailing_symlink(struct nameidata *nd)
nd->inode = nd->path.dentry->d_inode;
error = link_path_walk(s, nd);
if (unlikely(error))
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
return error;
}

@@ -2036,7 +2037,7 @@ static int path_lookupat(int dfd, const struct filename *name,
if (err)
break;
err = lookup_last(nd);
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
}
}

@@ -2380,7 +2381,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (err)
break;
err = mountpoint_last(nd, path);
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
}
out:
path_cleanup(nd);
@@ -3262,7 +3263,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
- put_link(nd, &nd->stack[0].link, nd->stack[0].cookie);
+ put_link(nd);
}
out:
path_cleanup(nd);
--
2.1.4

2015-05-11 18:30:36

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 050/110] namei: trim the arguments of get_link()

From: Al Viro <[email protected]>

same story as the previous commit

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 45 +++++++++++++++++++++------------------------
1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1ac3217..e5715a5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -843,27 +843,33 @@ static int may_linkat(struct path *link)
return -EPERM;
}

-static __always_inline const char *
-get_link(struct path *link, struct nameidata *nd, void **p)
+static __always_inline
+const char *get_link(struct nameidata *nd)
{
- struct dentry *dentry = link->dentry;
+ struct saved *last = nd->stack + nd->depth;
+ struct dentry *dentry = nd->link.dentry;
struct inode *inode = dentry->d_inode;
int error;
const char *res;

BUG_ON(nd->flags & LOOKUP_RCU);

- if (link->mnt == nd->path.mnt)
- mntget(link->mnt);
+ if (nd->link.mnt == nd->path.mnt)
+ mntget(nd->link.mnt);

- res = ERR_PTR(-ELOOP);
- if (unlikely(current->total_link_count >= 40))
- goto out;
+ if (unlikely(current->total_link_count >= 40)) {
+ path_put(&nd->path);
+ path_put(&nd->link);
+ return ERR_PTR(-ELOOP);
+ }
+
+ last->link = nd->link;
+ last->cookie = NULL;

cond_resched();
current->total_link_count++;

- touch_atime(link);
+ touch_atime(&last->link);

error = security_inode_follow_link(dentry);
res = ERR_PTR(error);
@@ -871,14 +877,13 @@ get_link(struct path *link, struct nameidata *nd, void **p)
goto out;

nd->last_type = LAST_BIND;
- *p = NULL;
res = inode->i_link;
if (!res) {
- res = inode->i_op->follow_link(dentry, p, nd);
+ res = inode->i_op->follow_link(dentry, &last->cookie, nd);
if (IS_ERR(res)) {
out:
path_put(&nd->path);
- path_put(link);
+ path_put(&last->link);
}
}
return res;
@@ -1717,7 +1722,6 @@ static inline u64 hash_name(const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
- struct saved *last = nd->stack;
int err;

while (*name=='/')
@@ -1795,16 +1799,13 @@ Walked:

nd->depth++;
current->link_count++;
- last++;

- last->link = nd->link;
- s = get_link(&last->link, nd, &last->cookie);
+ s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
current->link_count--;
nd->depth--;
- last--;
goto Err;
}
err = 0;
@@ -1813,7 +1814,6 @@ Walked:
put_link(nd);
current->link_count--;
nd->depth--;
- last--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1826,7 +1826,7 @@ Walked:
;
}
nd->inode = nd->path.dentry->d_inode;
- last->name = name;
+ nd->stack[nd->depth].name = name;
if (!*s)
goto OK;
name = s;
@@ -1844,17 +1844,15 @@ Err:
put_link(nd);
current->link_count--;
nd->depth--;
- last--;
}
return err;
OK:
if (unlikely(nd->depth)) {
- name = last->name;
+ name = nd->stack[nd->depth].name;
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd);
current->link_count--;
nd->depth--;
- last--;
goto Walked;
}
return 0;
@@ -1979,8 +1977,7 @@ static int trailing_symlink(struct nameidata *nd)
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
- nd->stack[0].link = nd->link;
- s = get_link(&nd->stack[0].link, nd, &nd->stack[0].cookie);
+ s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
if (unlikely(!s))
--
2.1.4

2015-05-11 18:30:39

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 051/110] namei: remove restrictions on nesting depth

From: Al Viro <[email protected]>

The only restriction is that on the total amount of symlinks
crossed; how they are nested does not matter

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 66 ++++++++++++++++++++++++++++++++++++++++-----------
include/linux/namei.h | 2 ++
2 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e5715a5..1ae34cd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,7 @@ void path_put(const struct path *path)
}
EXPORT_SYMBOL(path_put);

+#define EMBEDDED_LEVELS 2
struct nameidata {
struct path path;
union {
@@ -509,9 +510,42 @@ struct nameidata {
struct path link;
void *cookie;
const char *name;
- } stack[MAX_NESTED_LINKS + 1];
+ } *stack, internal[EMBEDDED_LEVELS];
};

+static void set_nameidata(struct nameidata *nd)
+{
+ nd->stack = nd->internal;
+}
+
+static void restore_nameidata(struct nameidata *nd)
+{
+ if (nd->stack != nd->internal) {
+ kfree(nd->stack);
+ nd->stack = nd->internal;
+ }
+}
+
+static int __nd_alloc_stack(struct nameidata *nd)
+{
+ struct saved *p = kmalloc((MAXSYMLINKS + 1) * sizeof(struct saved),
+ GFP_KERNEL);
+ if (unlikely(!p))
+ return -ENOMEM;
+ memcpy(p, nd->internal, sizeof(nd->internal));
+ nd->stack = p;
+ return 0;
+}
+
+static inline int nd_alloc_stack(struct nameidata *nd)
+{
+ if (likely(nd->depth != EMBEDDED_LEVELS - 1))
+ return 0;
+ if (likely(nd->stack != nd->internal))
+ return 0;
+ return __nd_alloc_stack(nd);
+}
+
/*
* Path walking has 2 modes, rcu-walk and ref-walk (see
* Documentation/filesystems/path-lookup.txt). In situations when we can't
@@ -857,7 +891,7 @@ const char *get_link(struct nameidata *nd)
if (nd->link.mnt == nd->path.mnt)
mntget(nd->link.mnt);

- if (unlikely(current->total_link_count >= 40)) {
+ if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
path_put(&nd->path);
path_put(&nd->link);
return ERR_PTR(-ELOOP);
@@ -1789,22 +1823,18 @@ Walked:
if (err) {
const char *s;

- if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
- path_put_conditional(&nd->link, nd);
- path_put(&nd->path);
- err = -ELOOP;
- goto Err;
+ err = nd_alloc_stack(nd);
+ if (unlikely(err)) {
+ path_to_nameidata(&nd->link, nd);
+ break;
}
- BUG_ON(nd->depth >= MAX_NESTED_LINKS);

nd->depth++;
- current->link_count++;

s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- current->link_count--;
nd->depth--;
goto Err;
}
@@ -1812,7 +1842,6 @@ Walked:
if (unlikely(!s)) {
/* jumped */
put_link(nd);
- current->link_count--;
nd->depth--;
} else {
if (*s == '/') {
@@ -1842,7 +1871,6 @@ Walked:
Err:
while (unlikely(nd->depth)) {
put_link(nd);
- current->link_count--;
nd->depth--;
}
return err;
@@ -1851,7 +1879,6 @@ OK:
name = nd->stack[nd->depth].name;
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd);
- current->link_count--;
nd->depth--;
goto Walked;
}
@@ -2055,7 +2082,11 @@ static int path_lookupat(int dfd, const struct filename *name,
static int filename_lookup(int dfd, struct filename *name,
unsigned int flags, struct nameidata *nd)
{
- int retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
+ int retval;
+
+ set_nameidata(nd);
+ retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
+
if (unlikely(retval == -ECHILD))
retval = path_lookupat(dfd, name, flags, nd);
if (unlikely(retval == -ESTALE))
@@ -2063,6 +2094,7 @@ static int filename_lookup(int dfd, struct filename *name,

if (likely(!retval))
audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
+ restore_nameidata(nd);
return retval;
}

@@ -2393,6 +2425,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
int error;
if (IS_ERR(name))
return PTR_ERR(name);
+ set_nameidata(&nd);
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
if (unlikely(error == -ECHILD))
error = path_mountpoint(dfd, name, path, &nd, flags);
@@ -2400,6 +2433,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
if (likely(!error))
audit_inode(name, path->dentry, 0);
+ restore_nameidata(&nd);
putname(name);
return error;
}
@@ -3288,11 +3322,13 @@ struct file *do_filp_open(int dfd, struct filename *pathname,
int flags = op->lookup_flags;
struct file *filp;

+ set_nameidata(&nd);
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
filp = path_openat(dfd, pathname, &nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_REVAL);
+ restore_nameidata(&nd);
return filp;
}

@@ -3306,6 +3342,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,

nd.root.mnt = mnt;
nd.root.dentry = dentry;
+ set_nameidata(&nd);

if (d_is_symlink(dentry) && op->intent & LOOKUP_OPEN)
return ERR_PTR(-ELOOP);
@@ -3319,6 +3356,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
file = path_openat(-1, filename, &nd, op, flags);
if (unlikely(file == ERR_PTR(-ESTALE)))
file = path_openat(-1, filename, &nd, op, flags | LOOKUP_REVAL);
+ restore_nameidata(&nd);
putname(filename);
return file;
}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a5d5bed..3a6cc96 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -11,6 +11,8 @@ struct nameidata;

enum { MAX_NESTED_LINKS = 8 };

+#define MAXSYMLINKS 40
+
/*
* Type of the last component on LOOKUP_PARENT
*/
--
2.1.4

2015-05-11 18:30:31

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 052/110] link_path_walk: nd->depth massage, part 1

From: Al Viro <[email protected]>

nd->stack[0] is unused until the handling of trailing symlinks and
we want to get rid of that. Having fucked that transformation up
several times, I went for bloody pedantic series of provably equivalent
transformations. Sorry.

Step 1: keep nd->depth higher by one in link_path_walk() - increment upon
entry, decrement on exits, adjust the arithmetics inside and surround the
calls of functions that care about nd->depth value (nd_alloc_stack(),
get_link(), put_link()) with decrement/increment pairs.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1ae34cd..e408f4d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1763,6 +1763,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!*name)
return 0;

+ nd->depth++;
/* At this point we know we have a real path component. */
for(;;) {
u64 hash_len;
@@ -1823,15 +1824,18 @@ Walked:
if (err) {
const char *s;

+ nd->depth--;
err = nd_alloc_stack(nd);
+ nd->depth++;
if (unlikely(err)) {
path_to_nameidata(&nd->link, nd);
break;
}

nd->depth++;
-
+ nd->depth--;
s = get_link(nd);
+ nd->depth++;

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
@@ -1841,7 +1845,9 @@ Walked:
err = 0;
if (unlikely(!s)) {
/* jumped */
+ nd->depth--;
put_link(nd);
+ nd->depth++;
nd->depth--;
} else {
if (*s == '/') {
@@ -1855,7 +1861,7 @@ Walked:
;
}
nd->inode = nd->path.dentry->d_inode;
- nd->stack[nd->depth].name = name;
+ nd->stack[nd->depth - 1].name = name;
if (!*s)
goto OK;
name = s;
@@ -1869,19 +1875,25 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth)) {
+ while (unlikely(nd->depth > 1)) {
+ nd->depth--;
put_link(nd);
+ nd->depth++;
nd->depth--;
}
+ nd->depth--;
return err;
OK:
- if (unlikely(nd->depth)) {
- name = nd->stack[nd->depth].name;
+ if (unlikely(nd->depth > 1)) {
+ name = nd->stack[nd->depth - 1].name;
err = walk_component(nd, LOOKUP_FOLLOW);
+ nd->depth--;
put_link(nd);
+ nd->depth++;
nd->depth--;
goto Walked;
}
+ nd->depth--;
return 0;
}

--
2.1.4

2015-05-11 18:30:29

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 053/110] link_path_walk: nd->depth massage, part 2

From: Al Viro <[email protected]>

collapse adjacent increment/decrement pairs.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e408f4d..a403425 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1832,8 +1832,6 @@ Walked:
break;
}

- nd->depth++;
- nd->depth--;
s = get_link(nd);
nd->depth++;

@@ -1847,8 +1845,6 @@ Walked:
/* jumped */
nd->depth--;
put_link(nd);
- nd->depth++;
- nd->depth--;
} else {
if (*s == '/') {
if (!nd->root.mnt)
@@ -1878,8 +1874,6 @@ Err:
while (unlikely(nd->depth > 1)) {
nd->depth--;
put_link(nd);
- nd->depth++;
- nd->depth--;
}
nd->depth--;
return err;
@@ -1889,8 +1883,6 @@ OK:
err = walk_component(nd, LOOKUP_FOLLOW);
nd->depth--;
put_link(nd);
- nd->depth++;
- nd->depth--;
goto Walked;
}
nd->depth--;
--
2.1.4

2015-05-11 18:30:26

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 054/110] link_path_walk: nd->depth massage, part 3

From: Al Viro <[email protected]>

remove decrement/increment surrounding nd_alloc_stack(), adjust the
test in it.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a403425..3df4731 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -539,7 +539,7 @@ static int __nd_alloc_stack(struct nameidata *nd)

static inline int nd_alloc_stack(struct nameidata *nd)
{
- if (likely(nd->depth != EMBEDDED_LEVELS - 1))
+ if (likely(nd->depth != EMBEDDED_LEVELS))
return 0;
if (likely(nd->stack != nd->internal))
return 0;
@@ -1824,9 +1824,7 @@ Walked:
if (err) {
const char *s;

- nd->depth--;
err = nd_alloc_stack(nd);
- nd->depth++;
if (unlikely(err)) {
path_to_nameidata(&nd->link, nd);
break;
--
2.1.4

2015-05-11 18:30:21

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 055/110] link_path_walk: nd->depth massage, part 4

From: Al Viro <[email protected]>

lift increment/decrement into link_path_walk() callers.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3df4731..cf5009b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1763,7 +1763,6 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!*name)
return 0;

- nd->depth++;
/* At this point we know we have a real path component. */
for(;;) {
u64 hash_len;
@@ -1873,7 +1872,6 @@ Err:
nd->depth--;
put_link(nd);
}
- nd->depth--;
return err;
OK:
if (unlikely(nd->depth > 1)) {
@@ -1883,7 +1881,6 @@ OK:
put_link(nd);
goto Walked;
}
- nd->depth--;
return 0;
}

@@ -1986,7 +1983,10 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
return -ECHILD;
done:
current->total_link_count = 0;
- return link_path_walk(s, nd);
+ nd->depth++;
+ retval = link_path_walk(s, nd);
+ nd->depth--;
+ return retval;
}

static void path_cleanup(struct nameidata *nd)
@@ -2020,7 +2020,9 @@ static int trailing_symlink(struct nameidata *nd)
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
+ nd->depth++;
error = link_path_walk(s, nd);
+ nd->depth--;
if (unlikely(error))
put_link(nd);
return error;
--
2.1.4

2015-05-11 18:28:05

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 056/110] trailing_symlink: nd->depth massage, part 5

From: Al Viro <[email protected]>

move increment of ->depth to the point where we'd discovered
that get_link() has not returned an error, adjust exits
accordingly.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cf5009b..5753f46 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2009,8 +2009,11 @@ static int trailing_symlink(struct nameidata *nd)
s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
- if (unlikely(!s))
+ nd->depth++;
+ if (unlikely(!s)) {
+ nd->depth--;
return 0;
+ }
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -2020,12 +2023,14 @@ static int trailing_symlink(struct nameidata *nd)
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
- nd->depth++;
error = link_path_walk(s, nd);
- nd->depth--;
- if (unlikely(error))
+ if (unlikely(error)) {
+ nd->depth--;
put_link(nd);
- return error;
+ return error;
+ }
+ nd->depth--;
+ return 0;
}

static inline int lookup_last(struct nameidata *nd)
--
2.1.4

2015-05-11 18:28:07

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 057/110] get_link: nd->depth massage, part 6

From: Al Viro <[email protected]>

make get_link() increment nd->depth on successful exit

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5753f46..93b5f73 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -918,8 +918,10 @@ const char *get_link(struct nameidata *nd)
out:
path_put(&nd->path);
path_put(&last->link);
+ return res;
}
}
+ nd->depth++;
return res;
}

@@ -1830,11 +1832,9 @@ Walked:
}

s = get_link(nd);
- nd->depth++;

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- nd->depth--;
goto Err;
}
err = 0;
@@ -2009,7 +2009,6 @@ static int trailing_symlink(struct nameidata *nd)
s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
- nd->depth++;
if (unlikely(!s)) {
nd->depth--;
return 0;
--
2.1.4

2015-05-11 18:28:03

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 058/110] trailing_symlink: nd->depth massage, part 7

From: Al Viro <[email protected]>

move decrement of nd->depth on successful returns into the callers.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 93b5f73..9df1c7a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2009,10 +2009,8 @@ static int trailing_symlink(struct nameidata *nd)
s = get_link(nd);
if (unlikely(IS_ERR(s)))
return PTR_ERR(s);
- if (unlikely(!s)) {
- nd->depth--;
+ if (unlikely(!s))
return 0;
- }
if (*s == '/') {
if (!nd->root.mnt)
set_root(nd);
@@ -2028,7 +2026,6 @@ static int trailing_symlink(struct nameidata *nd)
put_link(nd);
return error;
}
- nd->depth--;
return 0;
}

@@ -2069,6 +2066,7 @@ static int path_lookupat(int dfd, const struct filename *name,
if (err)
break;
err = lookup_last(nd);
+ nd->depth--;
put_link(nd);
}
}
@@ -2418,6 +2416,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (err)
break;
err = mountpoint_last(nd, path);
+ nd->depth--;
put_link(nd);
}
out:
@@ -3302,6 +3301,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
+ nd->depth--;
put_link(nd);
}
out:
--
2.1.4

2015-05-11 18:27:59

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 059/110] put_link: nd->depth massage, part 8

From: Al Viro <[email protected]>

all calls are preceded by decrement of nd->depth; move it into
put_link() itself.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9df1c7a..1f61955 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -754,7 +754,7 @@ void nd_jump_link(struct nameidata *nd, struct path *path)

static inline void put_link(struct nameidata *nd)
{
- struct saved *last = nd->stack + nd->depth;
+ struct saved *last = nd->stack + --nd->depth;
struct inode *inode = last->link.dentry->d_inode;
if (last->cookie && inode->i_op->put_link)
inode->i_op->put_link(last->link.dentry, last->cookie);
@@ -1840,7 +1840,6 @@ Walked:
err = 0;
if (unlikely(!s)) {
/* jumped */
- nd->depth--;
put_link(nd);
} else {
if (*s == '/') {
@@ -1868,16 +1867,13 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth > 1)) {
- nd->depth--;
+ while (unlikely(nd->depth > 1))
put_link(nd);
- }
return err;
OK:
if (unlikely(nd->depth > 1)) {
name = nd->stack[nd->depth - 1].name;
err = walk_component(nd, LOOKUP_FOLLOW);
- nd->depth--;
put_link(nd);
goto Walked;
}
@@ -2021,12 +2017,9 @@ static int trailing_symlink(struct nameidata *nd)
}
nd->inode = nd->path.dentry->d_inode;
error = link_path_walk(s, nd);
- if (unlikely(error)) {
- nd->depth--;
+ if (unlikely(error))
put_link(nd);
- return error;
- }
- return 0;
+ return error;
}

static inline int lookup_last(struct nameidata *nd)
@@ -2066,7 +2059,6 @@ static int path_lookupat(int dfd, const struct filename *name,
if (err)
break;
err = lookup_last(nd);
- nd->depth--;
put_link(nd);
}
}
@@ -2416,7 +2408,6 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (err)
break;
err = mountpoint_last(nd, path);
- nd->depth--;
put_link(nd);
}
out:
@@ -3301,7 +3292,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
break;
error = do_last(nd, file, op, &opened, pathname);
- nd->depth--;
put_link(nd);
}
out:
--
2.1.4

2015-05-11 18:27:57

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 060/110] link_path_walk: nd->depth massage, part 9

From: Al Viro <[email protected]>

Make link_path_walk() work with any value of nd->depth on entry -
memorize it and use it in tests instead of comparing with 1.
Don't bother with increment/decrement in path_init().

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1f61955..bc6d67e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1759,6 +1759,7 @@ static inline u64 hash_name(const char *name)
static int link_path_walk(const char *name, struct nameidata *nd)
{
int err;
+ int orig_depth = nd->depth;

while (*name=='/')
name++;
@@ -1867,11 +1868,11 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth > 1))
+ while (unlikely(nd->depth > orig_depth))
put_link(nd);
return err;
OK:
- if (unlikely(nd->depth > 1)) {
+ if (unlikely(nd->depth > orig_depth)) {
name = nd->stack[nd->depth - 1].name;
err = walk_component(nd, LOOKUP_FOLLOW);
put_link(nd);
@@ -1979,10 +1980,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
return -ECHILD;
done:
current->total_link_count = 0;
- nd->depth++;
- retval = link_path_walk(s, nd);
- nd->depth--;
- return retval;
+ return link_path_walk(s, nd);
}

static void path_cleanup(struct nameidata *nd)
--
2.1.4

2015-05-11 18:27:55

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 061/110] link_path_walk: nd->depth massage, part 10

From: Al Viro <[email protected]>

Get rid of orig_depth checks in OK: logics. If nd->depth is
zero, we had been called from path_init() and we are done.
If it is greater than 1, we are not done, whether we'd been
called from path_init() or trailing_symlink(). And in
case when it's 1, we might have been called from path_init()
and reached the end of nested symlink (in which case
nd->stack[0].name will point to the rest of pathname and
we are not done) or from trailing_symlink(), in which case
we are done.

Just have trailing_symlink() leave NULL in nd->stack[0].name
and use that to discriminate between those cases.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bc6d67e..6febe25 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1872,13 +1872,15 @@ Err:
put_link(nd);
return err;
OK:
- if (unlikely(nd->depth > orig_depth)) {
- name = nd->stack[nd->depth - 1].name;
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd);
- goto Walked;
- }
- return 0;
+ if (!nd->depth) /* called from path_init(), done */
+ return 0;
+ name = nd->stack[nd->depth - 1].name;
+ if (!name) /* called from trailing_symlink(), done */
+ return 0;
+
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd);
+ goto Walked;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
@@ -2014,6 +2016,7 @@ static int trailing_symlink(struct nameidata *nd)
nd->flags |= LOOKUP_JUMPED;
}
nd->inode = nd->path.dentry->d_inode;
+ nd->stack[0].name = NULL;
error = link_path_walk(s, nd);
if (unlikely(error))
put_link(nd);
--
2.1.4

2015-05-11 18:27:49

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 062/110] link_path_walk: end of nd->depth massage

From: Al Viro <[email protected]>

get rid of orig_depth - we only use it on error exit to tell whether
to stop doing put_link() when depth reaches 0 (call from path_init())
or when it reaches 1 (call from trailing_symlink()). However, in
the latter case the caller would immediately follow with one more
put_link(). Just keep doing it until the depth reaches zero (and
simplify trailing_symlink() as the result).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6febe25..d12b16c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1759,7 +1759,6 @@ static inline u64 hash_name(const char *name)
static int link_path_walk(const char *name, struct nameidata *nd)
{
int err;
- int orig_depth = nd->depth;

while (*name=='/')
name++;
@@ -1868,7 +1867,7 @@ Walked:
}
terminate_walk(nd);
Err:
- while (unlikely(nd->depth > orig_depth))
+ while (unlikely(nd->depth))
put_link(nd);
return err;
OK:
@@ -2017,10 +2016,7 @@ static int trailing_symlink(struct nameidata *nd)
}
nd->inode = nd->path.dentry->d_inode;
nd->stack[0].name = NULL;
- error = link_path_walk(s, nd);
- if (unlikely(error))
- put_link(nd);
- return error;
+ return link_path_walk(s, nd);
}

static inline int lookup_last(struct nameidata *nd)
--
2.1.4

2015-05-11 18:27:47

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 063/110] namei: we never need more than MAXSYMLINKS entries in nd->stack

From: Al Viro <[email protected]>

The only reason why we needed one more was that purely nested
MAXSYMLINKS symlinks could lead to path_init() using that many
entries in addition to nd->stack[0] which it left unused.

That can't happen now - path_init() starts with entry 0 (and
trailing_symlink() is called only when we'd already encountered
one symlink, so no more than MAXSYMLINKS-1 are left).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index d12b16c..b939f48 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -528,7 +528,7 @@ static void restore_nameidata(struct nameidata *nd)

static int __nd_alloc_stack(struct nameidata *nd)
{
- struct saved *p = kmalloc((MAXSYMLINKS + 1) * sizeof(struct saved),
+ struct saved *p = kmalloc(MAXSYMLINKS * sizeof(struct saved),
GFP_KERNEL);
if (unlikely(!p))
return -ENOMEM;
--
2.1.4

2015-05-11 18:27:52

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 064/110] namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers

From: Al Viro <[email protected]>

follow_dotdot_rcu() does an equivalent of terminate_walk() on failure;
shifting it into callers makes for simpler rules and those callers
already have terminate_walk() on other failure exits.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b939f48..25cd935 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1230,10 +1230,6 @@ static int follow_dotdot_rcu(struct nameidata *nd)
return 0;

failed:
- nd->flags &= ~LOOKUP_RCU;
- if (!(nd->flags & LOOKUP_ROOT))
- nd->root.mnt = NULL;
- rcu_read_unlock();
return -ECHILD;
}

@@ -1551,8 +1547,7 @@ static inline int handle_dots(struct nameidata *nd, int type)
{
if (type == LAST_DOTDOT) {
if (nd->flags & LOOKUP_RCU) {
- if (follow_dotdot_rcu(nd))
- return -ECHILD;
+ return follow_dotdot_rcu(nd);
} else
follow_dotdot(nd);
}
@@ -1592,8 +1587,12 @@ static int walk_component(struct nameidata *nd, int follow)
* to be able to know about the current root directory and
* parent relationships.
*/
- if (unlikely(nd->last_type != LAST_NORM))
- return handle_dots(nd, nd->last_type);
+ if (unlikely(nd->last_type != LAST_NORM)) {
+ err = handle_dots(nd, nd->last_type);
+ if (err)
+ goto out_err;
+ return 0;
+ }
err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
@@ -2981,8 +2980,10 @@ static int do_last(struct nameidata *nd,

if (nd->last_type != LAST_NORM) {
error = handle_dots(nd, nd->last_type);
- if (error)
+ if (unlikely(error)) {
+ terminate_walk(nd);
return error;
+ }
goto finish_open;
}

--
2.1.4

2015-05-11 18:26:05

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 065/110] lift terminate_walk() into callers of walk_component()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 25cd935..d99eaac 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1587,20 +1587,16 @@ static int walk_component(struct nameidata *nd, int follow)
* to be able to know about the current root directory and
* parent relationships.
*/
- if (unlikely(nd->last_type != LAST_NORM)) {
- err = handle_dots(nd, nd->last_type);
- if (err)
- goto out_err;
- return 0;
- }
+ if (unlikely(nd->last_type != LAST_NORM))
+ return handle_dots(nd, nd->last_type);
err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
- goto out_err;
+ return err;

err = lookup_slow(nd, &path);
if (err < 0)
- goto out_err;
+ return err;

inode = path.dentry->d_inode;
err = -ENOENT;
@@ -1612,8 +1608,7 @@ static int walk_component(struct nameidata *nd, int follow)
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path.mnt ||
unlazy_walk(nd, path.dentry))) {
- err = -ECHILD;
- goto out_err;
+ return -ECHILD;
}
}
BUG_ON(inode != path.dentry->d_inode);
@@ -1626,8 +1621,6 @@ static int walk_component(struct nameidata *nd, int follow)

out_path_put:
path_to_nameidata(&path, nd);
-out_err:
- terminate_walk(nd);
return err;
}

@@ -1819,7 +1812,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
err = walk_component(nd, LOOKUP_FOLLOW);
Walked:
if (err < 0)
- goto Err;
+ break;

if (err) {
const char *s;
@@ -2020,11 +2013,15 @@ static int trailing_symlink(struct nameidata *nd)

static inline int lookup_last(struct nameidata *nd)
{
+ int err;
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- return walk_component(nd, nd->flags & LOOKUP_FOLLOW);
+ err = walk_component(nd, nd->flags & LOOKUP_FOLLOW);
+ if (err < 0)
+ terminate_walk(nd);
+ return err;
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
--
2.1.4

2015-05-11 18:26:04

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 066/110] namei: lift (open-coded) terminate_walk() into callers of get_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d99eaac..db21e04 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -892,7 +892,6 @@ const char *get_link(struct nameidata *nd)
mntget(nd->link.mnt);

if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
- path_put(&nd->path);
path_put(&nd->link);
return ERR_PTR(-ELOOP);
}
@@ -916,7 +915,6 @@ const char *get_link(struct nameidata *nd)
res = inode->i_op->follow_link(dentry, &last->cookie, nd);
if (IS_ERR(res)) {
out:
- path_put(&nd->path);
path_put(&last->link);
return res;
}
@@ -1827,7 +1825,7 @@ Walked:

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
- goto Err;
+ break;
}
err = 0;
if (unlikely(!s)) {
@@ -1858,7 +1856,6 @@ Walked:
}
}
terminate_walk(nd);
-Err:
while (unlikely(nd->depth))
put_link(nd);
return err;
@@ -1994,8 +1991,10 @@ static int trailing_symlink(struct nameidata *nd)
return error;
nd->flags |= LOOKUP_PARENT;
s = get_link(nd);
- if (unlikely(IS_ERR(s)))
+ if (unlikely(IS_ERR(s))) {
+ terminate_walk(nd);
return PTR_ERR(s);
+ }
if (unlikely(!s))
return 0;
if (*s == '/') {
--
2.1.4

2015-05-11 18:24:34

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 067/110] namei: take put_link() into {lookup,mountpoint,do}_last()

From: Al Viro <[email protected]>

rationale: we'll need to have terminate_walk() do put_link() on
everything, which will mean that in some cases ..._last() will do
put_link() anyway. Easier to have them do it in all cases.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index db21e04..26466fb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2018,6 +2018,8 @@ static inline int lookup_last(struct nameidata *nd)

nd->flags &= ~LOOKUP_PARENT;
err = walk_component(nd, nd->flags & LOOKUP_FOLLOW);
+ if (nd->depth)
+ put_link(nd);
if (err < 0)
terminate_walk(nd);
return err;
@@ -2045,13 +2047,10 @@ static int path_lookupat(int dfd, const struct filename *name,
*/
err = path_init(dfd, name, flags, nd);
if (!err && !(flags & LOOKUP_PARENT)) {
- err = lookup_last(nd);
- while (err > 0) {
+ while ((err = lookup_last(nd)) > 0) {
err = trailing_symlink(nd);
if (err)
break;
- err = lookup_last(nd);
- put_link(nd);
}
}

@@ -2362,6 +2361,8 @@ done:
dput(dentry);
goto out;
}
+ if (nd->depth)
+ put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) {
@@ -2373,6 +2374,8 @@ done:
error = 0;
out:
terminate_walk(nd);
+ if (nd->depth)
+ put_link(nd);
return error;
}

@@ -2394,13 +2397,10 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
if (unlikely(err))
goto out;

- err = mountpoint_last(nd, path);
- while (err > 0) {
+ while ((err = mountpoint_last(nd, path)) > 0) {
err = trailing_symlink(nd);
if (err)
break;
- err = mountpoint_last(nd, path);
- put_link(nd);
}
out:
path_cleanup(nd);
@@ -2978,6 +2978,8 @@ static int do_last(struct nameidata *nd,
error = handle_dots(nd, nd->last_type);
if (unlikely(error)) {
terminate_walk(nd);
+ if (nd->depth)
+ put_link(nd);
return error;
}
goto finish_open;
@@ -3003,8 +3005,11 @@ static int do_last(struct nameidata *nd,
* about to look up
*/
error = complete_walk(nd);
- if (error)
+ if (error) {
+ if (nd->depth)
+ put_link(nd);
return error;
+ }

audit_inode(name, dir, LOOKUP_PARENT);
error = -EISDIR;
@@ -3093,6 +3098,8 @@ finish_lookup:
}
}
BUG_ON(inode != path.dentry->d_inode);
+ if (nd->depth)
+ put_link(nd);
nd->link = path;
return 1;
}
@@ -3116,6 +3123,8 @@ finish_lookup:
finish_open:
error = complete_walk(nd);
if (error) {
+ if (nd->depth)
+ put_link(nd);
path_put(&save_parent);
return error;
}
@@ -3167,6 +3176,8 @@ out:
mnt_drop_write(nd->path.mnt);
path_put(&save_parent);
terminate_walk(nd);
+ if (nd->depth)
+ put_link(nd);
return error;

exit_dput:
@@ -3279,14 +3290,11 @@ static struct file *path_openat(int dfd, struct filename *pathname,
if (unlikely(error))
goto out;

- error = do_last(nd, file, op, &opened, pathname);
- while (unlikely(error > 0)) { /* trailing symlink */
+ while ((error = do_last(nd, file, op, &opened, pathname)) > 0) {
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
error = trailing_symlink(nd);
if (unlikely(error))
break;
- error = do_last(nd, file, op, &opened, pathname);
- put_link(nd);
}
out:
path_cleanup(nd);
--
2.1.4

2015-05-11 18:24:37

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 068/110] namei: have terminate_walk() do put_link() on everything left

From: Al Viro <[email protected]>

All callers of terminate_walk() are followed by more or less
open-coded eqiuvalent of "do put_link() on everything left
in nd->stack". Better done in terminate_walk() itself, and
when we go for RCU symlink traversal we'll have to do it
there anyway.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 26466fb..31da717 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1562,6 +1562,8 @@ static void terminate_walk(struct nameidata *nd)
nd->root.mnt = NULL;
rcu_read_unlock();
}
+ while (unlikely(nd->depth))
+ put_link(nd);
}

/*
@@ -1856,8 +1858,6 @@ Walked:
}
}
terminate_walk(nd);
- while (unlikely(nd->depth))
- put_link(nd);
return err;
OK:
if (!nd->depth) /* called from path_init(), done */
@@ -2374,8 +2374,6 @@ done:
error = 0;
out:
terminate_walk(nd);
- if (nd->depth)
- put_link(nd);
return error;
}

@@ -2978,8 +2976,6 @@ static int do_last(struct nameidata *nd,
error = handle_dots(nd, nd->last_type);
if (unlikely(error)) {
terminate_walk(nd);
- if (nd->depth)
- put_link(nd);
return error;
}
goto finish_open;
@@ -3176,8 +3172,6 @@ out:
mnt_drop_write(nd->path.mnt);
path_put(&save_parent);
terminate_walk(nd);
- if (nd->depth)
- put_link(nd);
return error;

exit_dput:
--
2.1.4

2015-05-11 18:24:39

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 069/110] link_path_walk: move the OK: inside the loop

From: Al Viro <[email protected]>

fewer labels that way; in particular, resuming after the end of
nested symlink is straight-line.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 31da717..19e5c8a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1806,11 +1806,21 @@ static int link_path_walk(const char *name, struct nameidata *nd)
do {
name++;
} while (unlikely(*name == '/'));
- if (!*name)
- goto OK;
-
- err = walk_component(nd, LOOKUP_FOLLOW);
-Walked:
+ if (unlikely(!*name)) {
+OK:
+ /* called from path_init(), done */
+ if (!nd->depth)
+ return 0;
+ name = nd->stack[nd->depth - 1].name;
+ /* called from trailing_symlink(), done */
+ if (!name)
+ return 0;
+ /* last component of nested symlink */
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ put_link(nd);
+ } else {
+ err = walk_component(nd, LOOKUP_FOLLOW);
+ }
if (err < 0)
break;

@@ -1859,16 +1869,6 @@ Walked:
}
terminate_walk(nd);
return err;
-OK:
- if (!nd->depth) /* called from path_init(), done */
- return 0;
- name = nd->stack[nd->depth - 1].name;
- if (!name) /* called from trailing_symlink(), done */
- return 0;
-
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd);
- goto Walked;
}

static int path_init(int dfd, const struct filename *name, unsigned int flags,
--
2.1.4

2015-05-11 18:24:31

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 070/110] namei: new calling conventions for walk_component()

From: Al Viro <[email protected]>

instead of a single flag (!= 0 => we want to follow symlinks) pass
two bits - WALK_GET (want to follow symlinks) and WALK_PUT (put_link()
once we are done looking at the name). The latter matters only for
success exits - on failure the caller will discard everything anyway.

Suggestions for better variant are welcome; what this thing aims for
is making sure that pending put_link() is done *before* walk_component()
decides to pick a symlink up, rather than between picking it up and
acting upon it. See the next commit for payoff.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 19e5c8a..1c9af925 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1577,7 +1577,9 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
return unlikely(d_is_symlink(dentry)) ? follow : 0;
}

-static int walk_component(struct nameidata *nd, int follow)
+enum {WALK_GET = 1, WALK_PUT = 2};
+
+static int walk_component(struct nameidata *nd, int flags)
{
struct path path;
struct inode *inode;
@@ -1587,8 +1589,12 @@ static int walk_component(struct nameidata *nd, int follow)
* to be able to know about the current root directory and
* parent relationships.
*/
- if (unlikely(nd->last_type != LAST_NORM))
- return handle_dots(nd, nd->last_type);
+ if (unlikely(nd->last_type != LAST_NORM)) {
+ err = handle_dots(nd, nd->last_type);
+ if (flags & WALK_PUT)
+ put_link(nd);
+ return err;
+ }
err = lookup_fast(nd, &path, &inode);
if (unlikely(err)) {
if (err < 0)
@@ -1604,7 +1610,9 @@ static int walk_component(struct nameidata *nd, int follow)
goto out_path_put;
}

- if (should_follow_link(path.dentry, follow)) {
+ if (flags & WALK_PUT)
+ put_link(nd);
+ if (should_follow_link(path.dentry, flags & WALK_GET)) {
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != path.mnt ||
unlazy_walk(nd, path.dentry))) {
@@ -1816,10 +1824,9 @@ OK:
if (!name)
return 0;
/* last component of nested symlink */
- err = walk_component(nd, LOOKUP_FOLLOW);
- put_link(nd);
+ err = walk_component(nd, WALK_GET | WALK_PUT);
} else {
- err = walk_component(nd, LOOKUP_FOLLOW);
+ err = walk_component(nd, WALK_GET);
}
if (err < 0)
break;
@@ -2017,9 +2024,12 @@ static inline int lookup_last(struct nameidata *nd)
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- err = walk_component(nd, nd->flags & LOOKUP_FOLLOW);
- if (nd->depth)
- put_link(nd);
+ err = walk_component(nd,
+ nd->flags & LOOKUP_FOLLOW
+ ? nd->depth
+ ? WALK_PUT | WALK_GET
+ : WALK_GET
+ : 0);
if (err < 0)
terminate_walk(nd);
return err;
--
2.1.4

2015-05-11 18:24:29

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 071/110] namei: make should_follow_link() store the link in nd->link

From: Al Viro <[email protected]>

... if it decides to follow, that is.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 62 +++++++++++++++++++++++++++++++++-----------------------------
1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1c9af925..1b4bc1b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1566,15 +1566,31 @@ static void terminate_walk(struct nameidata *nd)
put_link(nd);
}

+static int pick_link(struct nameidata *nd, struct path *link)
+{
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(nd->path.mnt != link->mnt ||
+ unlazy_walk(nd, link->dentry))) {
+ return -ECHILD;
+ }
+ }
+ nd->link = *link;
+ return 1;
+}
+
/*
* Do we need to follow links? We _really_ want to be able
* to do this check without having to look at inode->i_op,
* so we keep a cache of "no, this doesn't need follow_link"
* for the common case.
*/
-static inline int should_follow_link(struct dentry *dentry, int follow)
+static inline int should_follow_link(struct nameidata *nd, struct path *link, int follow)
{
- return unlikely(d_is_symlink(dentry)) ? follow : 0;
+ if (likely(!d_is_symlink(link->dentry)))
+ return 0;
+ if (!follow)
+ return 0;
+ return pick_link(nd, link);
}

enum {WALK_GET = 1, WALK_PUT = 2};
@@ -1612,17 +1628,9 @@ static int walk_component(struct nameidata *nd, int flags)

if (flags & WALK_PUT)
put_link(nd);
- if (should_follow_link(path.dentry, flags & WALK_GET)) {
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path.mnt ||
- unlazy_walk(nd, path.dentry))) {
- return -ECHILD;
- }
- }
- BUG_ON(inode != path.dentry->d_inode);
- nd->link = path;
- return 1;
- }
+ err = should_follow_link(nd, &path, flags & WALK_GET);
+ if (unlikely(err))
+ return err;
path_to_nameidata(&path, nd);
nd->inode = inode;
return 0;
@@ -2375,9 +2383,11 @@ done:
put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
- if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) {
- nd->link = *path;
- return 1;
+ error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW);
+ if (unlikely(error)) {
+ if (error < 0)
+ goto out;
+ return error;
}
mntget(path->mnt);
follow_mount(path);
@@ -3095,19 +3105,13 @@ retry_lookup:
goto out;
}
finish_lookup:
- if (should_follow_link(path.dentry, nd->flags & LOOKUP_FOLLOW)) {
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != path.mnt ||
- unlazy_walk(nd, path.dentry))) {
- error = -ECHILD;
- goto out;
- }
- }
- BUG_ON(inode != path.dentry->d_inode);
- if (nd->depth)
- put_link(nd);
- nd->link = path;
- return 1;
+ if (nd->depth)
+ put_link(nd);
+ error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW);
+ if (unlikely(error)) {
+ if (error < 0)
+ goto out;
+ return error;
}

if (unlikely(d_is_symlink(path.dentry)) && !(open_flag & O_PATH)) {
--
2.1.4

2015-05-11 18:23:11

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 072/110] namei: move link count check and stack allocation into pick_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1b4bc1b..046a703 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -891,16 +891,10 @@ const char *get_link(struct nameidata *nd)
if (nd->link.mnt == nd->path.mnt)
mntget(nd->link.mnt);

- if (unlikely(current->total_link_count >= MAXSYMLINKS)) {
- path_put(&nd->link);
- return ERR_PTR(-ELOOP);
- }
-
last->link = nd->link;
last->cookie = NULL;

cond_resched();
- current->total_link_count++;

touch_atime(&last->link);

@@ -1568,12 +1562,23 @@ static void terminate_walk(struct nameidata *nd)

static int pick_link(struct nameidata *nd, struct path *link)
{
+ int error;
+ if (unlikely(current->total_link_count++ >= MAXSYMLINKS)) {
+ path_to_nameidata(link, nd);
+ return -ELOOP;
+ }
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != link->mnt ||
unlazy_walk(nd, link->dentry))) {
return -ECHILD;
}
}
+ error = nd_alloc_stack(nd);
+ if (unlikely(error)) {
+ path_to_nameidata(link, nd);
+ return error;
+ }
+
nd->link = *link;
return 1;
}
@@ -1840,15 +1845,7 @@ OK:
break;

if (err) {
- const char *s;
-
- err = nd_alloc_stack(nd);
- if (unlikely(err)) {
- path_to_nameidata(&nd->link, nd);
- break;
- }
-
- s = get_link(nd);
+ const char *s = get_link(nd);

if (unlikely(IS_ERR(s))) {
err = PTR_ERR(s);
--
2.1.4

2015-05-11 18:23:08

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 073/110] lustre: rip the private symlink nesting limit out

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
drivers/staging/lustre/lustre/llite/symlink.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index e488cb3..da6d9d1 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -126,18 +126,9 @@ static const char *ll_follow_link(struct dentry *dentry, void **cookie, struct n
char *symname = NULL;

CDEBUG(D_VFSTRACE, "VFS Op\n");
- /* Limit the recursive symlink depth to 5 instead of default
- * 8 links when kernel has 4k stack to prevent stack overflow.
- * For 8k stacks we need to limit it to 7 for local servers. */
- if (THREAD_SIZE < 8192 && current->link_count >= 6) {
- rc = -ELOOP;
- } else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
- rc = -ELOOP;
- } else {
- ll_inode_size_lock(inode);
- rc = ll_readlink_internal(inode, &request, &symname);
- ll_inode_size_unlock(inode);
- }
+ ll_inode_size_lock(inode);
+ rc = ll_readlink_internal(inode, &request, &symname);
+ ll_inode_size_unlock(inode);
if (rc) {
ptlrpc_req_finished(request);
return ERR_PTR(rc);
--
2.1.4

2015-05-11 18:23:05

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 074/110] VFS: replace {, total_}link_count in task_struct with pointer to nameidata

From: NeilBrown <[email protected]>

task_struct currently contains two ad-hoc members for use by the VFS:
link_count and total_link_count. These are only interesting to fs/namei.c,
so exposing them explicitly is poor layering. Incidentally, link_count
isn't used anymore, so it can just die.

This patches replaces those with a single pointer to 'struct nameidata'.
This structure represents the current filename lookup of which
there can only be one per process, and is a natural place to
store total_link_count.

This will allow the current "nameidata" argument to all
follow_link operations to be removed as current->nameidata
can be used instead in the _very_ few instances that care about
it at all.

As there are occasional circumstances where pathname lookup can
recurse, such as through kern_path_locked, we always save and old
current->nameidata (if there is one) when setting a new value, and
make sure any active link_counts are preserved.

follow_mount and follow_automount now get a 'struct nameidata *'
rather than 'int flags' so that they can directly access
total_link_count, rather than going through 'current'.

Suggested-by: Al Viro <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 70 ++++++++++++++++++++++++++++-----------------------
include/linux/sched.h | 2 +-
2 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 046a703..699e093 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,7 @@ struct nameidata {
unsigned seq, m_seq;
int last_type;
unsigned depth;
+ int total_link_count;
struct file *base;
struct saved {
struct path link;
@@ -513,16 +514,25 @@ struct nameidata {
} *stack, internal[EMBEDDED_LEVELS];
};

-static void set_nameidata(struct nameidata *nd)
+static struct nameidata *set_nameidata(struct nameidata *p)
{
- nd->stack = nd->internal;
+ struct nameidata *old = current->nameidata;
+ p->stack = p->internal;
+ p->total_link_count = old ? old->total_link_count : 0;
+ current->nameidata = p;
+ return old;
}

-static void restore_nameidata(struct nameidata *nd)
+static void restore_nameidata(struct nameidata *old)
{
- if (nd->stack != nd->internal) {
- kfree(nd->stack);
- nd->stack = nd->internal;
+ struct nameidata *now = current->nameidata;
+
+ current->nameidata = old;
+ if (old)
+ old->total_link_count = now->total_link_count;
+ if (now->stack != now->internal) {
+ kfree(now->stack);
+ now->stack = now->internal;
}
}

@@ -970,7 +980,7 @@ EXPORT_SYMBOL(follow_up);
* - return -EISDIR to tell follow_managed() to stop and return the path we
* were called with.
*/
-static int follow_automount(struct path *path, unsigned flags,
+static int follow_automount(struct path *path, struct nameidata *nd,
bool *need_mntput)
{
struct vfsmount *mnt;
@@ -990,13 +1000,13 @@ static int follow_automount(struct path *path, unsigned flags,
* as being automount points. These will need the attentions
* of the daemon to instantiate them before they can be used.
*/
- if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
- LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+ if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+ LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
path->dentry->d_inode)
return -EISDIR;

- current->total_link_count++;
- if (current->total_link_count >= 40)
+ nd->total_link_count++;
+ if (nd->total_link_count >= 40)
return -ELOOP;

mnt = path->dentry->d_op->d_automount(path);
@@ -1010,7 +1020,7 @@ static int follow_automount(struct path *path, unsigned flags,
* the path being looked up; if it wasn't then the remainder of
* the path is inaccessible and we should say so.
*/
- if (PTR_ERR(mnt) == -EISDIR && (flags & LOOKUP_PARENT))
+ if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
return -EREMOTE;
return PTR_ERR(mnt);
}
@@ -1050,7 +1060,7 @@ static int follow_automount(struct path *path, unsigned flags,
*
* Serialization is taken care of in namespace.c
*/
-static int follow_managed(struct path *path, unsigned flags)
+static int follow_managed(struct path *path, struct nameidata *nd)
{
struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
unsigned managed;
@@ -1094,7 +1104,7 @@ static int follow_managed(struct path *path, unsigned flags)

/* Handle an automount point */
if (managed & DCACHE_NEED_AUTOMOUNT) {
- ret = follow_automount(path, flags, &need_mntput);
+ ret = follow_automount(path, nd, &need_mntput);
if (ret < 0)
break;
continue;
@@ -1483,7 +1493,7 @@ unlazy:
}
path->mnt = mnt;
path->dentry = dentry;
- err = follow_managed(path, nd->flags);
+ err = follow_managed(path, nd);
if (unlikely(err < 0)) {
path_put_conditional(path, nd);
return err;
@@ -1513,7 +1523,7 @@ static int lookup_slow(struct nameidata *nd, struct path *path)
return PTR_ERR(dentry);
path->mnt = nd->path.mnt;
path->dentry = dentry;
- err = follow_managed(path, nd->flags);
+ err = follow_managed(path, nd);
if (unlikely(err < 0)) {
path_put_conditional(path, nd);
return err;
@@ -1563,7 +1573,7 @@ static void terminate_walk(struct nameidata *nd)
static int pick_link(struct nameidata *nd, struct path *link)
{
int error;
- if (unlikely(current->total_link_count++ >= MAXSYMLINKS)) {
+ if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
path_to_nameidata(link, nd);
return -ELOOP;
}
@@ -1981,7 +1991,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
rcu_read_unlock();
return -ECHILD;
done:
- current->total_link_count = 0;
+ nd->total_link_count = 0;
return link_path_walk(s, nd);
}

@@ -2087,10 +2097,9 @@ static int filename_lookup(int dfd, struct filename *name,
unsigned int flags, struct nameidata *nd)
{
int retval;
+ struct nameidata *saved_nd = set_nameidata(nd);

- set_nameidata(nd);
retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
-
if (unlikely(retval == -ECHILD))
retval = path_lookupat(dfd, name, flags, nd);
if (unlikely(retval == -ESTALE))
@@ -2098,7 +2107,7 @@ static int filename_lookup(int dfd, struct filename *name,

if (likely(!retval))
audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
- restore_nameidata(nd);
+ restore_nameidata(saved_nd);
return retval;
}

@@ -2426,11 +2435,11 @@ static int
filename_mountpoint(int dfd, struct filename *name, struct path *path,
unsigned int flags)
{
- struct nameidata nd;
+ struct nameidata nd, *saved;
int error;
if (IS_ERR(name))
return PTR_ERR(name);
- set_nameidata(&nd);
+ saved = set_nameidata(&nd);
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
if (unlikely(error == -ECHILD))
error = path_mountpoint(dfd, name, path, &nd, flags);
@@ -2438,7 +2447,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
if (likely(!error))
audit_inode(name, path->dentry, 0);
- restore_nameidata(&nd);
+ restore_nameidata(saved);
putname(name);
return error;
}
@@ -3087,7 +3096,7 @@ retry_lookup:
if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
goto exit_dput;

- error = follow_managed(&path, nd->flags);
+ error = follow_managed(&path, nd);
if (error < 0)
goto exit_dput;

@@ -3323,31 +3332,29 @@ out2:
struct file *do_filp_open(int dfd, struct filename *pathname,
const struct open_flags *op)
{
- struct nameidata nd;
+ struct nameidata nd, *saved_nd = set_nameidata(&nd);
int flags = op->lookup_flags;
struct file *filp;

- set_nameidata(&nd);
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
filp = path_openat(dfd, pathname, &nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_REVAL);
- restore_nameidata(&nd);
+ restore_nameidata(saved_nd);
return filp;
}

struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
const char *name, const struct open_flags *op)
{
- struct nameidata nd;
+ struct nameidata nd, *saved_nd;
struct file *file;
struct filename *filename;
int flags = op->lookup_flags | LOOKUP_ROOT;

nd.root.mnt = mnt;
nd.root.dentry = dentry;
- set_nameidata(&nd);

if (d_is_symlink(dentry) && op->intent & LOOKUP_OPEN)
return ERR_PTR(-ELOOP);
@@ -3356,12 +3363,13 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
if (unlikely(IS_ERR(filename)))
return ERR_CAST(filename);

+ saved_nd = set_nameidata(&nd);
file = path_openat(-1, filename, &nd, op, flags | LOOKUP_RCU);
if (unlikely(file == ERR_PTR(-ECHILD)))
file = path_openat(-1, filename, &nd, op, flags);
if (unlikely(file == ERR_PTR(-ESTALE)))
file = path_openat(-1, filename, &nd, op, flags | LOOKUP_REVAL);
- restore_nameidata(&nd);
+ restore_nameidata(saved_nd);
putname(filename);
return file;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 26a2e61..f6c9b69 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1461,7 +1461,7 @@ struct task_struct {
it with task_lock())
- initialized normally by setup_new_exec */
/* file system info */
- int link_count, total_link_count;
+ struct nameidata *nameidata;
#ifdef CONFIG_SYSVIPC
/* ipc stuff */
struct sysv_sem sysvsem;
--
2.1.4

2015-05-11 18:23:01

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 075/110] namei: simplify the callers of follow_managed()

From: Al Viro <[email protected]>

now that it gets nameidata, no reason to have setting LOOKUP_JUMPED on
mountpoint crossing and calling path_put_conditional() on failures
done in every caller.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 32 ++++++++++----------------------
1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 699e093..b57400c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1118,7 +1118,11 @@ static int follow_managed(struct path *path, struct nameidata *nd)
mntput(path->mnt);
if (ret == -EISDIR)
ret = 0;
- return ret < 0 ? ret : need_mntput;
+ if (need_mntput)
+ nd->flags |= LOOKUP_JUMPED;
+ if (unlikely(ret < 0))
+ path_put_conditional(path, nd);
+ return ret;
}

int follow_down_one(struct path *path)
@@ -1494,14 +1498,9 @@ unlazy:
path->mnt = mnt;
path->dentry = dentry;
err = follow_managed(path, nd);
- if (unlikely(err < 0)) {
- path_put_conditional(path, nd);
- return err;
- }
- if (err)
- nd->flags |= LOOKUP_JUMPED;
- *inode = path->dentry->d_inode;
- return 0;
+ if (likely(!err))
+ *inode = path->dentry->d_inode;
+ return err;

need_lookup:
return 1;
@@ -1511,7 +1510,6 @@ need_lookup:
static int lookup_slow(struct nameidata *nd, struct path *path)
{
struct dentry *dentry, *parent;
- int err;

parent = nd->path.dentry;
BUG_ON(nd->inode != parent->d_inode);
@@ -1523,14 +1521,7 @@ static int lookup_slow(struct nameidata *nd, struct path *path)
return PTR_ERR(dentry);
path->mnt = nd->path.mnt;
path->dentry = dentry;
- err = follow_managed(path, nd);
- if (unlikely(err < 0)) {
- path_put_conditional(path, nd);
- return err;
- }
- if (err)
- nd->flags |= LOOKUP_JUMPED;
- return 0;
+ return follow_managed(path, nd);
}

static inline int may_lookup(struct nameidata *nd)
@@ -3098,10 +3089,7 @@ retry_lookup:

error = follow_managed(&path, nd);
if (error < 0)
- goto exit_dput;
-
- if (error)
- nd->flags |= LOOKUP_JUMPED;
+ goto out;

BUG_ON(nd->flags & LOOKUP_RCU);
inode = path.dentry->d_inode;
--
2.1.4

2015-05-11 18:21:22

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 076/110] don't pass nameidata to ->follow_link()

From: Al Viro <[email protected]>

its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata

Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/Locking | 2 +-
Documentation/filesystems/vfs.txt | 2 +-
drivers/staging/lustre/lustre/llite/symlink.c | 2 +-
fs/9p/vfs_inode.c | 2 +-
fs/9p/vfs_inode_dotl.c | 2 +-
fs/autofs4/symlink.c | 2 +-
fs/befs/linuxvfs.c | 4 ++--
fs/cifs/cifsfs.h | 2 +-
fs/cifs/link.c | 2 +-
fs/configfs/symlink.c | 2 +-
fs/ecryptfs/inode.c | 2 +-
fs/ext4/symlink.c | 2 +-
fs/f2fs/namei.c | 4 ++--
fs/fuse/dir.c | 2 +-
fs/gfs2/inode.c | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hppfs/hppfs.c | 4 ++--
fs/kernfs/symlink.c | 2 +-
fs/libfs.c | 2 +-
fs/namei.c | 11 ++++++-----
fs/nfs/symlink.c | 2 +-
fs/overlayfs/inode.c | 4 ++--
fs/proc/base.c | 4 ++--
fs/proc/inode.c | 2 +-
fs/proc/namespaces.c | 4 ++--
fs/proc/self.c | 2 +-
fs/proc/thread_self.c | 2 +-
fs/xfs/xfs_iops.c | 3 +--
include/linux/fs.h | 6 +++---
include/linux/namei.h | 2 +-
mm/shmem.c | 2 +-
31 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 7fa6c4a..5b5b4f5 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,7 +50,7 @@ prototypes:
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ const char *(*follow_link) (struct dentry *, void **);
void (*put_link) (struct dentry *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 1c6b03a..0dec8c8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,7 +350,7 @@ struct inode_operations {
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- const char *(*follow_link) (struct dentry *, void **, struct nameidata *);
+ const char *(*follow_link) (struct dentry *, void **);
void (*put_link) (struct dentry *, void *);
int (*permission) (struct inode *, int);
int (*get_acl)(struct inode *, int);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index da6d9d1..f3be3bf 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
return rc;
}

-static const char *ll_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ll_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct ptlrpc_request *request = NULL;
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 7cc70a3..271f51a 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1230,7 +1230,7 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
*
*/

-static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie)
{
struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
struct p9_fid *fid = v9fs_fid_lookup(dentry);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index ae062ff..16658ed 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -910,7 +910,7 @@ error:
*/

static const char *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, void **cookie, struct nameidata *nd)
+v9fs_vfs_follow_link_dotl(struct dentry *dentry, void **cookie)
{
struct p9_fid *fid = v9fs_fid_lookup(dentry);
char *target;
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index 9c6a077..da0c334 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,7 +12,7 @@

#include "autofs_i.h"

-static const char *autofs4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *autofs4_follow_link(struct dentry *dentry, void **cookie)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 3a1aefb..46aedac 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,7 +42,7 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
static struct inode *befs_alloc_inode(struct super_block *sb);
static void befs_destroy_inode(struct inode *inode);
static void befs_destroy_inodecache(void);
-static const char *befs_follow_link(struct dentry *, void **, struct nameidata *nd);
+static const char *befs_follow_link(struct dentry *, void **);
static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -464,7 +464,7 @@ befs_destroy_inodecache(void)
* flag is set.
*/
static const char *
-befs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+befs_follow_link(struct dentry *dentry, void **cookie)
{
struct super_block *sb = dentry->d_sb;
struct befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 61012da..a782b22 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
#endif

/* Functions related to symlinks */
-extern const char *cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd);
+extern const char *cifs_follow_link(struct dentry *direntry, void **cookie);
extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
int buflen);
extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 4a439c2..546f86a 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -627,7 +627,7 @@ cifs_hl_exit:
}

const char *
-cifs_follow_link(struct dentry *direntry, void **cookie, struct nameidata *nd)
+cifs_follow_link(struct dentry *direntry, void **cookie)
{
struct inode *inode = d_inode(direntry);
int rc = -ENOMEM;
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index fac8e85..0ace756 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,7 +279,7 @@ static int configfs_getlink(struct dentry *dentry, char * path)

}

-static const char *configfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *configfs_follow_link(struct dentry *dentry, void **cookie)
{
unsigned long page = get_zeroed_page(GFP_KERNEL);
int error;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index cdb9d6c..73d20ae 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -675,7 +675,7 @@ out:
return rc ? ERR_PTR(rc) : buf;
}

-static const char *ecryptfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ecryptfs_follow_link(struct dentry *dentry, void **cookie)
{
size_t len;
char *buf = ecryptfs_readlink_lower(dentry, &len);
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index afec475..ba5bd18 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,7 +23,7 @@
#include "xattr.h"

#ifdef CONFIG_EXT4_FS_ENCRYPTION
-static const char *ext4_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ext4_follow_link(struct dentry *dentry, void **cookie)
{
struct page *cpage = NULL;
char *caddr, *paddr = NULL;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index d294793..cd05a7c 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -296,9 +296,9 @@ fail:
return err;
}

-static const char *f2fs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *f2fs_follow_link(struct dentry *dentry, void **cookie)
{
- const char *link = page_follow_link_light(dentry, cookie, nd);
+ const char *link = page_follow_link_light(dentry, cookie);
if (!IS_ERR(link) && !*link) {
/* this is broken symlink case */
page_put_link(dentry, *cookie);
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f9cb260..d5cdef8 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1365,7 +1365,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
return err;
}

-static const char *fuse_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *fuse_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index f59390a..3a1461d 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1548,7 +1548,7 @@ out:
* Returns: 0 on success or error code
*/

-static const char *gfs2_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *gfs2_follow_link(struct dentry *dentry, void **cookie)
{
struct gfs2_inode *ip = GFS2_I(d_inode(dentry));
struct gfs2_holder i_gh;
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index f650ed6..7b6ed7a 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -892,7 +892,7 @@ static const struct inode_operations hostfs_dir_iops = {
.setattr = hostfs_setattr,
};

-static const char *hostfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *hostfs_follow_link(struct dentry *dentry, void **cookie)
{
char *link = __getname();
if (link) {
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index b8f24d3..15a774e 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,11 +642,11 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
buflen);
}

-static const char *hppfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *hppfs_follow_link(struct dentry *dentry, void **cookie)
{
struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

- return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie, nd);
+ return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie);
}

static void hppfs_put_link(struct dentry *dentry, void *cookie)
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 3c7e799..366c5a1 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,7 +112,7 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
return error;
}

-static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie)
{
int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
diff --git a/fs/libfs.c b/fs/libfs.c
index 0c83fde..c5f3373 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1091,7 +1091,7 @@ simple_nosetlease(struct file *filp, long arg, struct file_lock **flp,
}
EXPORT_SYMBOL(simple_nosetlease);

-const char *simple_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+const char *simple_follow_link(struct dentry *dentry, void **cookie)
{
return d_inode(dentry)->i_link;
}
diff --git a/fs/namei.c b/fs/namei.c
index b57400c..f311f03 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -753,8 +753,9 @@ static inline void path_to_nameidata(const struct path *path,
* Helper to directly jump to a known parsed path from ->follow_link,
* caller must have taken a reference to path beforehand.
*/
-void nd_jump_link(struct nameidata *nd, struct path *path)
+void nd_jump_link(struct path *path)
{
+ struct nameidata *nd = current->nameidata;
path_put(&nd->path);

nd->path = *path;
@@ -916,7 +917,7 @@ const char *get_link(struct nameidata *nd)
nd->last_type = LAST_BIND;
res = inode->i_link;
if (!res) {
- res = inode->i_op->follow_link(dentry, &last->cookie, nd);
+ res = inode->i_op->follow_link(dentry, &last->cookie);
if (IS_ERR(res)) {
out:
path_put(&last->link);
@@ -4485,12 +4486,12 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
int res;

if (!link) {
- link = dentry->d_inode->i_op->follow_link(dentry, &cookie, NULL);
+ link = dentry->d_inode->i_op->follow_link(dentry, &cookie);
if (IS_ERR(link))
return PTR_ERR(link);
}
res = readlink_copy(buffer, buflen, link);
- if (cookie && dentry->d_inode->i_op->put_link)
+ if (dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, cookie);
return res;
}
@@ -4523,7 +4524,7 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
}
EXPORT_SYMBOL(page_readlink);

-const char *page_follow_link_light(struct dentry *dentry, void **cookie, struct nameidata *nd)
+const char *page_follow_link_light(struct dentry *dentry, void **cookie)
{
struct page *page = NULL;
char *res = page_getlink(dentry, &page);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index c992b20..b6de433 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -42,7 +42,7 @@ error:
return -EIO;
}

-static const char *nfs_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *nfs_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct page *page;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 235ad42..9986833 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,7 +140,7 @@ struct ovl_link_data {
void *cookie;
};

-static const char *ovl_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *ovl_follow_link(struct dentry *dentry, void **cookie)
{
struct dentry *realdentry;
struct inode *realinode;
@@ -160,7 +160,7 @@ static const char *ovl_follow_link(struct dentry *dentry, void **cookie, struct
data->realdentry = realdentry;
}

- ret = realinode->i_op->follow_link(realdentry, cookie, nd);
+ ret = realinode->i_op->follow_link(realdentry, cookie);
if (IS_ERR_OR_NULL(ret)) {
kfree(data);
return ret;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 52652f8..286a422 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1380,7 +1380,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
return -ENOENT;
}

-static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
struct path path;
@@ -1394,7 +1394,7 @@ static const char *proc_pid_follow_link(struct dentry *dentry, void **cookie, st
if (error)
goto out;

- nd_jump_link(nd, &path);
+ nd_jump_link(&path);
return NULL;
out:
return ERR_PTR(error);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index acd51d7..eb35874 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -393,7 +393,7 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
};
#endif

-static const char *proc_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_follow_link(struct dentry *dentry, void **cookie)
{
struct proc_dir_entry *pde = PDE(d_inode(dentry));
if (unlikely(!use_pde(pde)))
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 10d24dd..f6e8354 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
&mntns_operations,
};

-static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie)
{
struct inode *inode = d_inode(dentry);
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
@@ -45,7 +45,7 @@ static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie, str
if (ptrace_may_access(task, PTRACE_MODE_READ)) {
error = ns_get_path(&ns_path, task, ns_ops);
if (!error)
- nd_jump_link(nd, &ns_path);
+ nd_jump_link(&ns_path);
}
put_task_struct(task);
return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index ad33394..113b8d0 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -18,7 +18,7 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static const char *proc_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_self_follow_link(struct dentry *dentry, void **cookie)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 85c96e0..947b0f4 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -19,7 +19,7 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
return readlink_copy(buffer, buflen, tmp);
}

-static const char *proc_thread_self_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *proc_thread_self_follow_link(struct dentry *dentry, void **cookie)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
pid_t tgid = task_tgid_nr_ns(current, ns);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 26c4dcb..7f51f39 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -416,8 +416,7 @@ xfs_vn_rename(
STATIC const char *
xfs_vn_follow_link(
struct dentry *dentry,
- void **cookie,
- struct nameidata *nd)
+ void **cookie)
{
char *link;
int error = -ENOMEM;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab9341..ed7c9f2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1608,7 +1608,7 @@ struct file_operations {

struct inode_operations {
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
- const char * (*follow_link) (struct dentry *, void **, struct nameidata *);
+ const char * (*follow_link) (struct dentry *, void **);
int (*permission) (struct inode *, int);
struct posix_acl * (*get_acl)(struct inode *, int);

@@ -2705,7 +2705,7 @@ extern const struct file_operations generic_ro_fops;

extern int readlink_copy(char __user *, int, const char *);
extern int page_readlink(struct dentry *, char __user *, int);
-extern const char *page_follow_link_light(struct dentry *, void **, struct nameidata *);
+extern const char *page_follow_link_light(struct dentry *, void **);
extern void page_put_link(struct dentry *, void *);
extern int __page_symlink(struct inode *inode, const char *symname, int len,
int nofs);
@@ -2722,7 +2722,7 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);
-const char *simple_follow_link(struct dentry *, void **, struct nameidata *);
+const char *simple_follow_link(struct dentry *, void **);
extern const struct inode_operations simple_symlink_inode_operations;

extern int iterate_dir(struct file *, struct dir_context *);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 3a6cc96..d756304 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -72,7 +72,7 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);

-extern void nd_jump_link(struct nameidata *nd, struct path *path);
+extern void nd_jump_link(struct path *path);

static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
diff --git a/mm/shmem.c b/mm/shmem.c
index d1693dc..e026822 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2475,7 +2475,7 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
return 0;
}

-static const char *shmem_follow_link(struct dentry *dentry, void **cookie, struct nameidata *nd)
+static const char *shmem_follow_link(struct dentry *dentry, void **cookie)
{
struct page *page = NULL;
int error = shmem_getpage(d_inode(dentry), 0, &page, SGP_READ, NULL);
--
2.1.4

2015-05-11 18:21:14

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 077/110] namei: simplify failure exits in get_link()

From: Al Viro <[email protected]>

when cookie is NULL, put_link() is equivalent to path_put(), so
as soon as we'd set last->cookie to NULL, we can bump nd->depth and
let the normal logics in terminate_walk() to take care of cleanups.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f311f03..678aeef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -904,27 +904,23 @@ const char *get_link(struct nameidata *nd)

last->link = nd->link;
last->cookie = NULL;
+ nd->depth++;

cond_resched();

touch_atime(&last->link);

error = security_inode_follow_link(dentry);
- res = ERR_PTR(error);
if (error)
- goto out;
+ return ERR_PTR(error);

nd->last_type = LAST_BIND;
res = inode->i_link;
if (!res) {
res = inode->i_op->follow_link(dentry, &last->cookie);
- if (IS_ERR(res)) {
-out:
- path_put(&last->link);
- return res;
- }
+ if (IS_ERR_OR_NULL(res))
+ last->cookie = NULL;
}
- nd->depth++;
return res;
}

--
2.1.4

2015-05-11 18:21:20

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 078/110] namei: simpler treatment of symlinks with nothing other that / in the body

From: Al Viro <[email protected]>

Instead of saving name and branching to OK:, where we'll immediately restore
it, and call walk_component() with WALK_PUT|WALK_GET and nd->last_type being
LAST_BIND, which is equivalent to put_link(nd), err = 0, we can just treat
that the same way we'd treat procfs-style "jump" symlinks - do put_link(nd)
and move on.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 678aeef..c5eb77a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1865,11 +1865,13 @@ OK:
;
}
nd->inode = nd->path.dentry->d_inode;
- nd->stack[nd->depth - 1].name = name;
- if (!*s)
- goto OK;
- name = s;
- continue;
+ if (unlikely(!*s)) {
+ put_link(nd);
+ } else {
+ nd->stack[nd->depth - 1].name = name;
+ name = s;
+ continue;
+ }
}
}
if (!d_can_lookup(nd->path.dentry)) {
--
2.1.4

2015-05-11 18:21:11

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 079/110] namei: take the treatment of absolute symlinks to get_link()

From: Al Viro <[email protected]>

rather than letting the callers handle the jump-to-root part of
semantics, do it right in get_link() and return the rest of the
body for the caller to deal with - at that point it's treated
the same way as relative symlinks would be. And return NULL
when there's no "rest of the body" - those are treated the same
as pure jump symlink would be.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 49 ++++++++++++++++++++-----------------------------
1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c5eb77a..c6ff9da 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -918,9 +918,24 @@ const char *get_link(struct nameidata *nd)
res = inode->i_link;
if (!res) {
res = inode->i_op->follow_link(dentry, &last->cookie);
- if (IS_ERR_OR_NULL(res))
+ if (IS_ERR_OR_NULL(res)) {
last->cookie = NULL;
+ return res;
+ }
+ }
+ if (*res == '/') {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->inode = nd->path.dentry->d_inode;
+ nd->flags |= LOOKUP_JUMPED;
+ while (unlikely(*++res == '/'))
+ ;
}
+ if (!*res)
+ res = NULL;
return res;
}

@@ -1854,24 +1869,9 @@ OK:
/* jumped */
put_link(nd);
} else {
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- while (unlikely(*++s == '/'))
- ;
- }
- nd->inode = nd->path.dentry->d_inode;
- if (unlikely(!*s)) {
- put_link(nd);
- } else {
- nd->stack[nd->depth - 1].name = name;
- name = s;
- continue;
- }
+ nd->stack[nd->depth - 1].name = name;
+ name = s;
+ continue;
}
}
if (!d_can_lookup(nd->path.dentry)) {
@@ -2002,6 +2002,7 @@ static int trailing_symlink(struct nameidata *nd)
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
+ nd->stack[0].name = NULL;
s = get_link(nd);
if (unlikely(IS_ERR(s))) {
terminate_walk(nd);
@@ -2009,16 +2010,6 @@ static int trailing_symlink(struct nameidata *nd)
}
if (unlikely(!s))
return 0;
- if (*s == '/') {
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->flags |= LOOKUP_JUMPED;
- }
- nd->inode = nd->path.dentry->d_inode;
- nd->stack[0].name = NULL;
return link_path_walk(s, nd);
}

--
2.1.4

2015-05-11 18:21:09

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 080/110] namei: fold put_link() into the failure case of complete_walk()

From: Al Viro <[email protected]>

... and don't open-code unlazy_walk() in there - the only reason
for that is to avoid verfication of cached nd->root, which is
trivially avoided by discarding said cached nd->root first.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 29 ++++++-----------------------
1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c6ff9da..55283fe 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,6 +567,8 @@ static inline int nd_alloc_stack(struct nameidata *nd)
* to restart the path walk from the beginning in ref-walk mode.
*/

+static void terminate_walk(struct nameidata *nd);
+
/**
* unlazy_walk - try to switch to ref-walk mode.
* @nd: nameidata pathwalk data
@@ -673,26 +675,12 @@ static int complete_walk(struct nameidata *nd)
int status;

if (nd->flags & LOOKUP_RCU) {
- nd->flags &= ~LOOKUP_RCU;
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
-
- if (!legitimize_mnt(nd->path.mnt, nd->m_seq)) {
- rcu_read_unlock();
- return -ECHILD;
- }
- if (unlikely(!lockref_get_not_dead(&dentry->d_lockref))) {
- rcu_read_unlock();
- mntput(nd->path.mnt);
- return -ECHILD;
- }
- if (read_seqcount_retry(&dentry->d_seq, nd->seq)) {
- rcu_read_unlock();
- dput(dentry);
- mntput(nd->path.mnt);
+ if (unlikely(unlazy_walk(nd, NULL))) {
+ terminate_walk(nd);
return -ECHILD;
}
- rcu_read_unlock();
}

if (likely(!(nd->flags & LOOKUP_JUMPED)))
@@ -708,7 +696,7 @@ static int complete_walk(struct nameidata *nd)
if (!status)
status = -ESTALE;

- path_put(&nd->path);
+ terminate_walk(nd);
return status;
}

@@ -3008,11 +2996,8 @@ static int do_last(struct nameidata *nd,
* about to look up
*/
error = complete_walk(nd);
- if (error) {
- if (nd->depth)
- put_link(nd);
+ if (error)
return error;
- }

audit_inode(name, dir, LOOKUP_PARENT);
error = -EISDIR;
@@ -3117,8 +3102,6 @@ finish_lookup:
finish_open:
error = complete_walk(nd);
if (error) {
- if (nd->depth)
- put_link(nd);
path_put(&save_parent);
return error;
}
--
2.1.4

2015-05-11 18:21:05

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 081/110] namei: move bumping the refcount of link->mnt into pick_link()

From: Al Viro <[email protected]>

update the failure cleanup in may_follow_link() to match that.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 55283fe..05efcc0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -802,7 +802,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
return 0;

audit_log_link_denied("follow_link", link);
- path_put_conditional(link, nd);
+ path_put(link);
path_put(&nd->path);
return -EACCES;
}
@@ -887,9 +887,6 @@ const char *get_link(struct nameidata *nd)

BUG_ON(nd->flags & LOOKUP_RCU);

- if (nd->link.mnt == nd->path.mnt)
- mntget(nd->link.mnt);
-
last->link = nd->link;
last->cookie = NULL;
nd->depth++;
@@ -1574,9 +1571,11 @@ static int pick_link(struct nameidata *nd, struct path *link)
return -ECHILD;
}
}
+ if (link->mnt == nd->path.mnt)
+ mntget(link->mnt);
error = nd_alloc_stack(nd);
if (unlikely(error)) {
- path_to_nameidata(link, nd);
+ path_put(link);
return error;
}

--
2.1.4

2015-05-11 18:19:47

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 082/110] may_follow_link(): trim arguments

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 05efcc0..82cb1bc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -765,7 +765,6 @@ int sysctl_protected_hardlinks __read_mostly = 0;

/**
* may_follow_link - Check symlink following for unsafe situations
- * @link: The path of the symlink
* @nd: nameidata pathwalk data
*
* In the case of the sysctl_protected_symlinks sysctl being enabled,
@@ -779,7 +778,7 @@ int sysctl_protected_hardlinks __read_mostly = 0;
*
* Returns 0 if following the symlink is allowed, -ve on error.
*/
-static inline int may_follow_link(struct path *link, struct nameidata *nd)
+static inline int may_follow_link(struct nameidata *nd)
{
const struct inode *inode;
const struct inode *parent;
@@ -788,7 +787,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
return 0;

/* Allowed if owner and follower match. */
- inode = link->dentry->d_inode;
+ inode = nd->link.dentry->d_inode;
if (uid_eq(current_cred()->fsuid, inode->i_uid))
return 0;

@@ -801,8 +800,8 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
if (uid_eq(parent->i_uid, inode->i_uid))
return 0;

- audit_log_link_denied("follow_link", link);
- path_put(link);
+ audit_log_link_denied("follow_link", &nd->link);
+ path_put(&nd->link);
path_put(&nd->path);
return -EACCES;
}
@@ -1985,7 +1984,7 @@ static void path_cleanup(struct nameidata *nd)
static int trailing_symlink(struct nameidata *nd)
{
const char *s;
- int error = may_follow_link(&nd->link, nd);
+ int error = may_follow_link(nd);
if (unlikely(error))
return error;
nd->flags |= LOOKUP_PARENT;
--
2.1.4

2015-05-11 18:19:45

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 083/110] namei: kill nd->link

From: Al Viro <[email protected]>

Just store it in nd->stack[nd->depth].link right in pick_link().
Now that we make sure of stack expansion in pick_link(), we can
do so...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 82cb1bc..277ca86 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -495,10 +495,7 @@ EXPORT_SYMBOL(path_put);
#define EMBEDDED_LEVELS 2
struct nameidata {
struct path path;
- union {
- struct qstr last;
- struct path link;
- };
+ struct qstr last;
struct path root;
struct inode *inode; /* path.dentry.d_inode */
unsigned int flags;
@@ -787,7 +784,7 @@ static inline int may_follow_link(struct nameidata *nd)
return 0;

/* Allowed if owner and follower match. */
- inode = nd->link.dentry->d_inode;
+ inode = nd->stack[0].link.dentry->d_inode;
if (uid_eq(current_cred()->fsuid, inode->i_uid))
return 0;

@@ -800,8 +797,8 @@ static inline int may_follow_link(struct nameidata *nd)
if (uid_eq(parent->i_uid, inode->i_uid))
return 0;

- audit_log_link_denied("follow_link", &nd->link);
- path_put(&nd->link);
+ audit_log_link_denied("follow_link", &nd->stack[0].link);
+ path_put(&nd->stack[0].link);
path_put(&nd->path);
return -EACCES;
}
@@ -879,14 +876,13 @@ static __always_inline
const char *get_link(struct nameidata *nd)
{
struct saved *last = nd->stack + nd->depth;
- struct dentry *dentry = nd->link.dentry;
+ struct dentry *dentry = last->link.dentry;
struct inode *inode = dentry->d_inode;
int error;
const char *res;

BUG_ON(nd->flags & LOOKUP_RCU);

- last->link = nd->link;
last->cookie = NULL;
nd->depth++;

@@ -1560,6 +1556,7 @@ static void terminate_walk(struct nameidata *nd)
static int pick_link(struct nameidata *nd, struct path *link)
{
int error;
+ struct saved *last;
if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
path_to_nameidata(link, nd);
return -ELOOP;
@@ -1578,7 +1575,8 @@ static int pick_link(struct nameidata *nd, struct path *link)
return error;
}

- nd->link = *link;
+ last = nd->stack + nd->depth;
+ last->link = *link;
return 1;
}

--
2.1.4

2015-05-11 18:19:44

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 084/110] namei: take increment of nd->depth into pick_link()

From: Al Viro <[email protected]>

Makes the situation much more regular - we avoid a strange state
when the element just after the top of stack is used to store
struct path of symlink, but isn't counted in nd->depth. This
is much more regular, so the normal failure exits, etc., work
fine.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 277ca86..6d4692d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -798,8 +798,7 @@ static inline int may_follow_link(struct nameidata *nd)
return 0;

audit_log_link_denied("follow_link", &nd->stack[0].link);
- path_put(&nd->stack[0].link);
- path_put(&nd->path);
+ terminate_walk(nd);
return -EACCES;
}

@@ -875,7 +874,7 @@ static int may_linkat(struct path *link)
static __always_inline
const char *get_link(struct nameidata *nd)
{
- struct saved *last = nd->stack + nd->depth;
+ struct saved *last = nd->stack + nd->depth - 1;
struct dentry *dentry = last->link.dentry;
struct inode *inode = dentry->d_inode;
int error;
@@ -883,9 +882,6 @@ const char *get_link(struct nameidata *nd)

BUG_ON(nd->flags & LOOKUP_RCU);

- last->cookie = NULL;
- nd->depth++;
-
cond_resched();

touch_atime(&last->link);
@@ -1575,8 +1571,9 @@ static int pick_link(struct nameidata *nd, struct path *link)
return error;
}

- last = nd->stack + nd->depth;
+ last = nd->stack + nd->depth++;
last->link = *link;
+ last->cookie = NULL;
return 1;
}

--
2.1.4

2015-05-11 18:18:09

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 085/110] namei: may_follow_link() - lift terminate_walk() on failures into caller

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6d4692d..51e2214 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -798,7 +798,6 @@ static inline int may_follow_link(struct nameidata *nd)
return 0;

audit_log_link_denied("follow_link", &nd->stack[0].link);
- terminate_walk(nd);
return -EACCES;
}

@@ -1980,8 +1979,10 @@ static int trailing_symlink(struct nameidata *nd)
{
const char *s;
int error = may_follow_link(nd);
- if (unlikely(error))
+ if (unlikely(error)) {
+ terminate_walk(nd);
return error;
+ }
nd->flags |= LOOKUP_PARENT;
nd->stack[0].name = NULL;
s = get_link(nd);
--
2.1.4

2015-05-11 18:18:06

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 086/110] namei: split off filename_lookupat() with LOOKUP_PARENT

From: Al Viro <[email protected]>

new functions: filename_parentat() and path_parentat() resp.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 37 +++++++++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 51e2214..6f95bc9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2034,7 +2034,7 @@ static int path_lookupat(int dfd, const struct filename *name,
* be able to complete).
*/
err = path_init(dfd, name, flags, nd);
- if (!err && !(flags & LOOKUP_PARENT)) {
+ if (!err) {
while ((err = lookup_last(nd)) > 0) {
err = trailing_symlink(nd);
if (err)
@@ -2074,6 +2074,35 @@ static int filename_lookup(int dfd, struct filename *name,
return retval;
}

+/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
+static int path_parentat(int dfd, const struct filename *name,
+ unsigned int flags, struct nameidata *nd)
+{
+ int err = path_init(dfd, name, flags | LOOKUP_PARENT, nd);
+ if (!err)
+ err = complete_walk(nd);
+ path_cleanup(nd);
+ return err;
+}
+
+static int filename_parentat(int dfd, struct filename *name,
+ unsigned int flags, struct nameidata *nd)
+{
+ int retval;
+ struct nameidata *saved_nd = set_nameidata(nd);
+
+ retval = path_parentat(dfd, name, flags | LOOKUP_RCU, nd);
+ if (unlikely(retval == -ECHILD))
+ retval = path_parentat(dfd, name, flags, nd);
+ if (unlikely(retval == -ESTALE))
+ retval = path_parentat(dfd, name, flags | LOOKUP_REVAL, nd);
+
+ if (likely(!retval))
+ audit_inode(name, nd->path.dentry, LOOKUP_PARENT);
+ restore_nameidata(saved_nd);
+ return retval;
+}
+
/* does lookup, returns the object with parent locked */
struct dentry *kern_path_locked(const char *name, struct path *path)
{
@@ -2085,7 +2114,7 @@ struct dentry *kern_path_locked(const char *name, struct path *path)
if (IS_ERR(filename))
return ERR_CAST(filename);

- err = filename_lookup(AT_FDCWD, filename, LOOKUP_PARENT, &nd);
+ err = filename_parentat(AT_FDCWD, filename, 0, &nd);
if (err) {
d = ERR_PTR(err);
goto out;
@@ -2255,7 +2284,7 @@ user_path_parent(int dfd, const char __user *path,
if (IS_ERR(s))
return s;

- error = filename_lookup(dfd, s, flags | LOOKUP_PARENT, &nd);
+ error = filename_parentat(dfd, s, flags, &nd);
if (error) {
putname(s);
return ERR_PTR(error);
@@ -3344,7 +3373,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
*/
lookup_flags &= LOOKUP_REVAL;

- error = filename_lookup(dfd, name, LOOKUP_PARENT|lookup_flags, &nd);
+ error = filename_parentat(dfd, name, lookup_flags, &nd);
if (error)
return ERR_PTR(error);

--
2.1.4

2015-05-11 18:18:03

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 087/110] namei: get rid of nameidata->base

From: Al Viro <[email protected]>

we can do fdput() under rcu_read_lock() just fine; all we need to take
care of is fetching nd->inode value first.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6f95bc9..497d5f4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -503,7 +503,6 @@ struct nameidata {
int last_type;
unsigned depth;
int total_link_count;
- struct file *base;
struct saved {
struct path link;
void *cookie;
@@ -1872,7 +1871,6 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
nd->depth = 0;
- nd->base = NULL;
if (flags & LOOKUP_ROOT) {
struct dentry *root = nd->root.dentry;
struct inode *inode = root->d_inode;
@@ -1941,14 +1939,15 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,

nd->path = f.file->f_path;
if (flags & LOOKUP_RCU) {
- if (f.flags & FDPUT_FPUT)
- nd->base = f.file;
- nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq);
rcu_read_lock();
+ nd->inode = nd->path.dentry->d_inode;
+ nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
} else {
path_get(&nd->path);
- fdput(f);
+ nd->inode = nd->path.dentry->d_inode;
}
+ fdput(f);
+ goto done;
}

nd->inode = nd->path.dentry->d_inode;
@@ -1971,8 +1970,6 @@ static void path_cleanup(struct nameidata *nd)
path_put(&nd->root);
nd->root.mnt = NULL;
}
- if (unlikely(nd->base))
- fput(nd->base);
}

static int trailing_symlink(struct nameidata *nd)
--
2.1.4

2015-05-11 18:16:41

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 088/110] namei: path_init() calling conventions change

From: Al Viro <[email protected]>

* lift link_path_walk() into callers; moving it down into path_init()
had been a mistake. Stack footprint, among other things...
* do _not_ call path_cleanup() after path_init() failure; on all failure
exits out of it we have nothing for path_cleanup() to do
* have path_init() return pathname or ERR_PTR(-E...)

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 69 +++++++++++++++++++++++++++++++-------------------------------
1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 497d5f4..06c7120 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1821,11 +1821,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
} while (unlikely(*name == '/'));
if (unlikely(!*name)) {
OK:
- /* called from path_init(), done */
+ /* pathname body, done */
if (!nd->depth)
return 0;
name = nd->stack[nd->depth - 1].name;
- /* called from trailing_symlink(), done */
+ /* trailing symlink, done */
if (!name)
return 0;
/* last component of nested symlink */
@@ -1862,8 +1862,8 @@ OK:
return err;
}

-static int path_init(int dfd, const struct filename *name, unsigned int flags,
- struct nameidata *nd)
+static const char *path_init(int dfd, const struct filename *name,
+ unsigned int flags, struct nameidata *nd)
{
int retval = 0;
const char *s = name->name;
@@ -1871,15 +1871,16 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
nd->depth = 0;
+ nd->total_link_count = 0;
if (flags & LOOKUP_ROOT) {
struct dentry *root = nd->root.dentry;
struct inode *inode = root->d_inode;
if (*s) {
if (!d_can_lookup(root))
- return -ENOTDIR;
+ return ERR_PTR(-ENOTDIR);
retval = inode_permission(inode, MAY_EXEC);
if (retval)
- return retval;
+ return ERR_PTR(retval);
}
nd->path = nd->root;
nd->inode = inode;
@@ -1890,7 +1891,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
} else {
path_get(&nd->path);
}
- goto done;
+ return s;
}

nd->root.mnt = NULL;
@@ -1926,14 +1927,14 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
struct dentry *dentry;

if (!f.file)
- return -EBADF;
+ return ERR_PTR(-EBADF);

dentry = f.file->f_path.dentry;

if (*s) {
if (!d_can_lookup(dentry)) {
fdput(f);
- return -ENOTDIR;
+ return ERR_PTR(-ENOTDIR);
}
}

@@ -1947,21 +1948,18 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
nd->inode = nd->path.dentry->d_inode;
}
fdput(f);
- goto done;
+ return s;
}

nd->inode = nd->path.dentry->d_inode;
if (!(flags & LOOKUP_RCU))
- goto done;
+ return s;
if (likely(!read_seqcount_retry(&nd->path.dentry->d_seq, nd->seq)))
- goto done;
+ return s;
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
rcu_read_unlock();
- return -ECHILD;
-done:
- nd->total_link_count = 0;
- return link_path_walk(s, nd);
+ return ERR_PTR(-ECHILD);
}

static void path_cleanup(struct nameidata *nd)
@@ -2014,23 +2012,12 @@ static inline int lookup_last(struct nameidata *nd)
static int path_lookupat(int dfd, const struct filename *name,
unsigned int flags, struct nameidata *nd)
{
+ const char *s = path_init(dfd, name, flags, nd);
int err;

- /*
- * Path walking is largely split up into 2 different synchronisation
- * schemes, rcu-walk and ref-walk (explained in
- * Documentation/filesystems/path-lookup.txt). These share much of the
- * path walk code, but some things particularly setup, cleanup, and
- * following mounts are sufficiently divergent that functions are
- * duplicated. Typically there is a function foo(), and its RCU
- * analogue, foo_rcu().
- *
- * -ECHILD is the error number of choice (just to avoid clashes) that
- * is returned if some aspect of an rcu-walk fails. Such an error must
- * be handled by restarting a traditional ref-walk (which will always
- * be able to complete).
- */
- err = path_init(dfd, name, flags, nd);
+ if (IS_ERR(s))
+ return PTR_ERR(s);
+ err = link_path_walk(s, nd);
if (!err) {
while ((err = lookup_last(nd)) > 0) {
err = trailing_symlink(nd);
@@ -2075,7 +2062,11 @@ static int filename_lookup(int dfd, struct filename *name,
static int path_parentat(int dfd, const struct filename *name,
unsigned int flags, struct nameidata *nd)
{
- int err = path_init(dfd, name, flags | LOOKUP_PARENT, nd);
+ const char *s = path_init(dfd, name, flags, nd);
+ int err;
+ if (IS_ERR(s))
+ return PTR_ERR(s);
+ err = link_path_walk(s, nd);
if (!err)
err = complete_walk(nd);
path_cleanup(nd);
@@ -2406,7 +2397,11 @@ static int
path_mountpoint(int dfd, const struct filename *name, struct path *path,
struct nameidata *nd, unsigned int flags)
{
- int err = path_init(dfd, name, flags, nd);
+ const char *s = path_init(dfd, name, flags, nd);
+ int err;
+ if (IS_ERR(s))
+ return PTR_ERR(s);
+ err = link_path_walk(s, nd);
if (unlikely(err))
goto out;

@@ -3266,6 +3261,7 @@ out:
static struct file *path_openat(int dfd, struct filename *pathname,
struct nameidata *nd, const struct open_flags *op, int flags)
{
+ const char *s;
struct file *file;
int opened = 0;
int error;
@@ -3281,7 +3277,12 @@ static struct file *path_openat(int dfd, struct filename *pathname,
goto out2;
}

- error = path_init(dfd, pathname, flags, nd);
+ s = path_init(dfd, pathname, flags, nd);
+ if (IS_ERR(s)) {
+ put_filp(file);
+ return ERR_CAST(s);
+ }
+ error = link_path_walk(s, nd);
if (unlikely(error))
goto out;

--
2.1.4

2015-05-11 18:18:01

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 089/110] namei: lift link_path_walk() call out of trailing_symlink()

From: Al Viro <[email protected]>

Make trailing_symlink() return the pathname to traverse or ERR_PTR(-E...).
A subtle point is that for "magic" symlinks it returns "" now - that
leads to link_path_walk("", nd), which is immediately returning 0 and
we are back to the treatment of the last component, at whereever the
damn thing has left us.

Reduces the stack footprint - link_path_walk() called on more shallow
stack now.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 50 +++++++++++++++++++++++---------------------------
1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 06c7120..46f4266 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1970,24 +1970,24 @@ static void path_cleanup(struct nameidata *nd)
}
}

-static int trailing_symlink(struct nameidata *nd)
+static const char *trailing_symlink(struct nameidata *nd)
{
const char *s;
int error = may_follow_link(nd);
if (unlikely(error)) {
terminate_walk(nd);
- return error;
+ return ERR_PTR(error);
}
nd->flags |= LOOKUP_PARENT;
nd->stack[0].name = NULL;
s = get_link(nd);
if (unlikely(IS_ERR(s))) {
terminate_walk(nd);
- return PTR_ERR(s);
+ return s;
}
if (unlikely(!s))
- return 0;
- return link_path_walk(s, nd);
+ s = "";
+ return s;
}

static inline int lookup_last(struct nameidata *nd)
@@ -2017,12 +2017,12 @@ static int path_lookupat(int dfd, const struct filename *name,

if (IS_ERR(s))
return PTR_ERR(s);
- err = link_path_walk(s, nd);
- if (!err) {
- while ((err = lookup_last(nd)) > 0) {
- err = trailing_symlink(nd);
- if (err)
- break;
+ while (!(err = link_path_walk(s, nd))
+ && ((err = lookup_last(nd)) > 0)) {
+ s = trailing_symlink(nd);
+ if (IS_ERR(s)) {
+ err = PTR_ERR(s);
+ break;
}
}

@@ -2401,16 +2401,14 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
int err;
if (IS_ERR(s))
return PTR_ERR(s);
- err = link_path_walk(s, nd);
- if (unlikely(err))
- goto out;
-
- while ((err = mountpoint_last(nd, path)) > 0) {
- err = trailing_symlink(nd);
- if (err)
+ while (!(err = link_path_walk(s, nd)) &&
+ (err = mountpoint_last(nd, path)) > 0) {
+ s = trailing_symlink(nd);
+ if (IS_ERR(s)) {
+ err = PTR_ERR(s);
break;
+ }
}
-out:
path_cleanup(nd);
return err;
}
@@ -3282,17 +3280,15 @@ static struct file *path_openat(int dfd, struct filename *pathname,
put_filp(file);
return ERR_CAST(s);
}
- error = link_path_walk(s, nd);
- if (unlikely(error))
- goto out;
-
- while ((error = do_last(nd, file, op, &opened, pathname)) > 0) {
+ while (!(error = link_path_walk(s, nd)) &&
+ (error = do_last(nd, file, op, &opened, pathname)) > 0) {
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- error = trailing_symlink(nd);
- if (unlikely(error))
+ s = trailing_symlink(nd);
+ if (IS_ERR(s)) {
+ error = PTR_ERR(s);
break;
+ }
}
-out:
path_cleanup(nd);
out2:
if (!(opened & FILE_OPENED)) {
--
2.1.4

2015-05-11 18:16:42

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 090/110] namei: lift terminate_walk() all the way up

From: Al Viro <[email protected]>

Lift it from link_path_walk(), trailing_symlink(), lookup_last(),
mountpoint_last(), complete_walk() and do_last(). A _lot_ of
those suckers merge.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 106 ++++++++++++++++++++-----------------------------------------
1 file changed, 34 insertions(+), 72 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 46f4266..27c3859 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -563,8 +563,6 @@ static inline int nd_alloc_stack(struct nameidata *nd)
* to restart the path walk from the beginning in ref-walk mode.
*/

-static void terminate_walk(struct nameidata *nd);
-
/**
* unlazy_walk - try to switch to ref-walk mode.
* @nd: nameidata pathwalk data
@@ -673,10 +671,8 @@ static int complete_walk(struct nameidata *nd)
if (nd->flags & LOOKUP_RCU) {
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
- if (unlikely(unlazy_walk(nd, NULL))) {
- terminate_walk(nd);
+ if (unlikely(unlazy_walk(nd, NULL)))
return -ECHILD;
- }
}

if (likely(!(nd->flags & LOOKUP_JUMPED)))
@@ -692,7 +688,6 @@ static int complete_walk(struct nameidata *nd)
if (!status)
status = -ESTALE;

- terminate_walk(nd);
return status;
}

@@ -1858,7 +1853,6 @@ OK:
break;
}
}
- terminate_walk(nd);
return err;
}

@@ -1974,38 +1968,26 @@ static const char *trailing_symlink(struct nameidata *nd)
{
const char *s;
int error = may_follow_link(nd);
- if (unlikely(error)) {
- terminate_walk(nd);
+ if (unlikely(error))
return ERR_PTR(error);
- }
nd->flags |= LOOKUP_PARENT;
nd->stack[0].name = NULL;
s = get_link(nd);
- if (unlikely(IS_ERR(s))) {
- terminate_walk(nd);
- return s;
- }
- if (unlikely(!s))
- s = "";
- return s;
+ return s ? s : "";
}

static inline int lookup_last(struct nameidata *nd)
{
- int err;
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;

nd->flags &= ~LOOKUP_PARENT;
- err = walk_component(nd,
+ return walk_component(nd,
nd->flags & LOOKUP_FOLLOW
? nd->depth
? WALK_PUT | WALK_GET
: WALK_GET
: 0);
- if (err < 0)
- terminate_walk(nd);
- return err;
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -2025,16 +2007,14 @@ static int path_lookupat(int dfd, const struct filename *name,
break;
}
}
-
if (!err)
err = complete_walk(nd);

- if (!err && nd->flags & LOOKUP_DIRECTORY) {
- if (!d_can_lookup(nd->path.dentry)) {
- path_put(&nd->path);
+ if (!err && nd->flags & LOOKUP_DIRECTORY)
+ if (!d_can_lookup(nd->path.dentry))
err = -ENOTDIR;
- }
- }
+ if (err)
+ terminate_walk(nd);

path_cleanup(nd);
return err;
@@ -2069,6 +2049,8 @@ static int path_parentat(int dfd, const struct filename *name,
err = link_path_walk(s, nd);
if (!err)
err = complete_walk(nd);
+ if (err)
+ terminate_walk(nd);
path_cleanup(nd);
return err;
}
@@ -2320,10 +2302,8 @@ mountpoint_last(struct nameidata *nd, struct path *path)

/* If we're in rcuwalk, drop out of it to handle last component */
if (nd->flags & LOOKUP_RCU) {
- if (unlazy_walk(nd, NULL)) {
- error = -ECHILD;
- goto out;
- }
+ if (unlazy_walk(nd, NULL))
+ return -ECHILD;
}

nd->flags &= ~LOOKUP_PARENT;
@@ -2331,7 +2311,7 @@ mountpoint_last(struct nameidata *nd, struct path *path)
if (unlikely(nd->last_type != LAST_NORM)) {
error = handle_dots(nd, nd->last_type);
if (error)
- goto out;
+ return error;
dentry = dget(nd->path.dentry);
goto done;
}
@@ -2346,41 +2326,32 @@ mountpoint_last(struct nameidata *nd, struct path *path)
*/
dentry = d_alloc(dir, &nd->last);
if (!dentry) {
- error = -ENOMEM;
mutex_unlock(&dir->d_inode->i_mutex);
- goto out;
+ return -ENOMEM;
}
dentry = lookup_real(dir->d_inode, dentry, nd->flags);
- error = PTR_ERR(dentry);
if (IS_ERR(dentry)) {
mutex_unlock(&dir->d_inode->i_mutex);
- goto out;
+ return PTR_ERR(dentry);
}
}
mutex_unlock(&dir->d_inode->i_mutex);

done:
if (d_is_negative(dentry)) {
- error = -ENOENT;
dput(dentry);
- goto out;
+ return -ENOENT;
}
if (nd->depth)
put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW);
- if (unlikely(error)) {
- if (error < 0)
- goto out;
+ if (unlikely(error))
return error;
- }
mntget(path->mnt);
follow_mount(path);
- error = 0;
-out:
- terminate_walk(nd);
- return error;
+ return 0;
}

/**
@@ -2409,6 +2380,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
break;
}
}
+ terminate_walk(nd);
path_cleanup(nd);
return err;
}
@@ -2982,10 +2954,8 @@ static int do_last(struct nameidata *nd,

if (nd->last_type != LAST_NORM) {
error = handle_dots(nd, nd->last_type);
- if (unlikely(error)) {
- terminate_walk(nd);
+ if (unlikely(error))
return error;
- }
goto finish_open;
}

@@ -2998,7 +2968,7 @@ static int do_last(struct nameidata *nd,
goto finish_lookup;

if (error < 0)
- goto out;
+ return error;

BUG_ON(nd->inode != dir->d_inode);
} else {
@@ -3013,10 +2983,9 @@ static int do_last(struct nameidata *nd,
return error;

audit_inode(name, dir, LOOKUP_PARENT);
- error = -EISDIR;
/* trailing slashes? */
- if (nd->last.name[nd->last.len])
- goto out;
+ if (unlikely(nd->last.name[nd->last.len]))
+ return -EISDIR;
}

retry_lookup:
@@ -3071,35 +3040,31 @@ retry_lookup:
got_write = false;
}

- error = -EEXIST;
- if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
- goto exit_dput;
+ if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
+ path_to_nameidata(&path, nd);
+ return -EEXIST;
+ }

error = follow_managed(&path, nd);
- if (error < 0)
- goto out;
+ if (unlikely(error < 0))
+ return error;

BUG_ON(nd->flags & LOOKUP_RCU);
inode = path.dentry->d_inode;
- error = -ENOENT;
- if (d_is_negative(path.dentry)) {
+ if (unlikely(d_is_negative(path.dentry))) {
path_to_nameidata(&path, nd);
- goto out;
+ return -ENOENT;
}
finish_lookup:
if (nd->depth)
put_link(nd);
error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW);
- if (unlikely(error)) {
- if (error < 0)
- goto out;
+ if (unlikely(error))
return error;
- }

if (unlikely(d_is_symlink(path.dentry)) && !(open_flag & O_PATH)) {
path_to_nameidata(&path, nd);
- error = -ELOOP;
- goto out;
+ return -ELOOP;
}

if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
@@ -3165,12 +3130,8 @@ out:
if (got_write)
mnt_drop_write(nd->path.mnt);
path_put(&save_parent);
- terminate_walk(nd);
return error;

-exit_dput:
- path_put_conditional(&path, nd);
- goto out;
exit_fput:
fput(file);
goto out;
@@ -3289,6 +3250,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
break;
}
}
+ terminate_walk(nd);
path_cleanup(nd);
out2:
if (!(opened & FILE_OPENED)) {
--
2.1.4

2015-05-11 18:16:39

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 091/110] link_path_walk: use explicit returns for failure exits

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 27c3859..756e150 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1772,7 +1772,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)

err = may_lookup(nd);
if (err)
- break;
+ return err;

hash_len = hash_name(name);

@@ -1794,7 +1794,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
struct qstr this = { { .hash_len = hash_len }, .name = name };
err = parent->d_op->d_hash(parent, &this);
if (err < 0)
- break;
+ return err;
hash_len = this.hash_len;
name = this.name;
}
@@ -1829,15 +1829,13 @@ OK:
err = walk_component(nd, WALK_GET);
}
if (err < 0)
- break;
+ return err;

if (err) {
const char *s = get_link(nd);

- if (unlikely(IS_ERR(s))) {
- err = PTR_ERR(s);
- break;
- }
+ if (unlikely(IS_ERR(s)))
+ return PTR_ERR(s);
err = 0;
if (unlikely(!s)) {
/* jumped */
@@ -1848,12 +1846,9 @@ OK:
continue;
}
}
- if (!d_can_lookup(nd->path.dentry)) {
- err = -ENOTDIR;
- break;
- }
+ if (unlikely(!d_can_lookup(nd->path.dentry)))
+ return -ENOTDIR;
}
- return err;
}

static const char *path_init(int dfd, const struct filename *name,
--
2.1.4

2015-05-11 18:14:14

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 092/110] namei: explicitly pass seq number to unlazy_walk() when dentry != NULL

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 756e150..97315df 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,13 +567,14 @@ static inline int nd_alloc_stack(struct nameidata *nd)
* unlazy_walk - try to switch to ref-walk mode.
* @nd: nameidata pathwalk data
* @dentry: child of nd->path.dentry or NULL
+ * @seq: seq number to check dentry against
* Returns: 0 on success, -ECHILD on failure
*
* unlazy_walk attempts to legitimize the current nd->path, nd->root and dentry
* for ref-walk mode. @dentry must be a path found by a do_lookup call on
* @nd or NULL. Must be called from rcu-walk context.
*/
-static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
+static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq)
{
struct fs_struct *fs = current->fs;
struct dentry *parent = nd->path.dentry;
@@ -615,7 +616,7 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
} else {
if (!lockref_get_not_dead(&dentry->d_lockref))
goto out;
- if (read_seqcount_retry(&dentry->d_seq, nd->seq))
+ if (read_seqcount_retry(&dentry->d_seq, seq))
goto drop_dentry;
}

@@ -671,7 +672,7 @@ static int complete_walk(struct nameidata *nd)
if (nd->flags & LOOKUP_RCU) {
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
- if (unlikely(unlazy_walk(nd, NULL)))
+ if (unlikely(unlazy_walk(nd, NULL, 0)))
return -ECHILD;
}

@@ -1451,7 +1452,7 @@ static int lookup_fast(struct nameidata *nd,
if (likely(__follow_mount_rcu(nd, path, inode)))
return 0;
unlazy:
- if (unlazy_walk(nd, dentry))
+ if (unlazy_walk(nd, dentry, nd->seq))
return -ECHILD;
} else {
dentry = __d_lookup(parent, &nd->last);
@@ -1511,7 +1512,7 @@ static inline int may_lookup(struct nameidata *nd)
int err = inode_permission(nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
if (err != -ECHILD)
return err;
- if (unlazy_walk(nd, NULL))
+ if (unlazy_walk(nd, NULL, 0))
return -ECHILD;
}
return inode_permission(nd->inode, MAY_EXEC);
@@ -1552,7 +1553,7 @@ static int pick_link(struct nameidata *nd, struct path *link)
}
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != link->mnt ||
- unlazy_walk(nd, link->dentry))) {
+ unlazy_walk(nd, link->dentry, nd->seq))) {
return -ECHILD;
}
}
@@ -2297,7 +2298,7 @@ mountpoint_last(struct nameidata *nd, struct path *path)

/* If we're in rcuwalk, drop out of it to handle last component */
if (nd->flags & LOOKUP_RCU) {
- if (unlazy_walk(nd, NULL))
+ if (unlazy_walk(nd, NULL, 0))
return -ECHILD;
}

--
2.1.4

2015-05-11 18:14:08

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 093/110] namei: don't mangle nd->seq in lookup_fast()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 97315df..e321450 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1134,7 +1134,7 @@ static inline int managed_dentry_rcu(struct dentry *dentry)
* we meet a managed dentry that would need blocking.
*/
static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
- struct inode **inode)
+ struct inode **inode, unsigned *seqp)
{
for (;;) {
struct mount *mounted;
@@ -1161,7 +1161,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
path->mnt = &mounted->mnt;
path->dentry = mounted->mnt.mnt_root;
nd->flags |= LOOKUP_JUMPED;
- nd->seq = read_seqcount_begin(&path->dentry->d_seq);
+ *seqp = read_seqcount_begin(&path->dentry->d_seq);
/*
* Update the inode too. We don't need to re-check the
* dentry sequence number here after this d_inode read,
@@ -1397,7 +1397,8 @@ static struct dentry *__lookup_hash(struct qstr *name,
* It _is_ time-critical.
*/
static int lookup_fast(struct nameidata *nd,
- struct path *path, struct inode **inode)
+ struct path *path, struct inode **inode,
+ unsigned *seqp)
{
struct vfsmount *mnt = nd->path.mnt;
struct dentry *dentry, *parent = nd->path.dentry;
@@ -1437,8 +1438,8 @@ static int lookup_fast(struct nameidata *nd,
*/
if (__read_seqcount_retry(&parent->d_seq, nd->seq))
return -ECHILD;
- nd->seq = seq;

+ *seqp = seq;
if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
status = d_revalidate(dentry, nd->flags);
if (unlikely(status <= 0)) {
@@ -1449,10 +1450,10 @@ static int lookup_fast(struct nameidata *nd,
}
path->mnt = mnt;
path->dentry = dentry;
- if (likely(__follow_mount_rcu(nd, path, inode)))
+ if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
return 0;
unlazy:
- if (unlazy_walk(nd, dentry, nd->seq))
+ if (unlazy_walk(nd, dentry, seq))
return -ECHILD;
} else {
dentry = __d_lookup(parent, &nd->last);
@@ -1543,7 +1544,7 @@ static void terminate_walk(struct nameidata *nd)
put_link(nd);
}

-static int pick_link(struct nameidata *nd, struct path *link)
+static int pick_link(struct nameidata *nd, struct path *link, unsigned seq)
{
int error;
struct saved *last;
@@ -1553,7 +1554,7 @@ static int pick_link(struct nameidata *nd, struct path *link)
}
if (nd->flags & LOOKUP_RCU) {
if (unlikely(nd->path.mnt != link->mnt ||
- unlazy_walk(nd, link->dentry, nd->seq))) {
+ unlazy_walk(nd, link->dentry, seq))) {
return -ECHILD;
}
}
@@ -1577,13 +1578,14 @@ static int pick_link(struct nameidata *nd, struct path *link)
* so we keep a cache of "no, this doesn't need follow_link"
* for the common case.
*/
-static inline int should_follow_link(struct nameidata *nd, struct path *link, int follow)
+static inline int should_follow_link(struct nameidata *nd, struct path *link,
+ int follow, unsigned seq)
{
if (likely(!d_is_symlink(link->dentry)))
return 0;
if (!follow)
return 0;
- return pick_link(nd, link);
+ return pick_link(nd, link, seq);
}

enum {WALK_GET = 1, WALK_PUT = 2};
@@ -1592,6 +1594,7 @@ static int walk_component(struct nameidata *nd, int flags)
{
struct path path;
struct inode *inode;
+ unsigned seq;
int err;
/*
* "." and ".." are special - ".." especially so because it has
@@ -1604,7 +1607,7 @@ static int walk_component(struct nameidata *nd, int flags)
put_link(nd);
return err;
}
- err = lookup_fast(nd, &path, &inode);
+ err = lookup_fast(nd, &path, &inode, &seq);
if (unlikely(err)) {
if (err < 0)
return err;
@@ -1614,6 +1617,7 @@ static int walk_component(struct nameidata *nd, int flags)
return err;

inode = path.dentry->d_inode;
+ seq = 0; /* we are already out of RCU mode */
err = -ENOENT;
if (d_is_negative(path.dentry))
goto out_path_put;
@@ -1621,11 +1625,12 @@ static int walk_component(struct nameidata *nd, int flags)

if (flags & WALK_PUT)
put_link(nd);
- err = should_follow_link(nd, &path, flags & WALK_GET);
+ err = should_follow_link(nd, &path, flags & WALK_GET, seq);
if (unlikely(err))
return err;
path_to_nameidata(&path, nd);
nd->inode = inode;
+ nd->seq = seq;
return 0;

out_path_put:
@@ -2342,7 +2347,7 @@ done:
put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
- error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW);
+ error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW, 0);
if (unlikely(error))
return error;
mntget(path->mnt);
@@ -2939,6 +2944,7 @@ static int do_last(struct nameidata *nd,
bool will_truncate = (open_flag & O_TRUNC) != 0;
bool got_write = false;
int acc_mode = op->acc_mode;
+ unsigned seq;
struct inode *inode;
struct path save_parent = { .dentry = NULL, .mnt = NULL };
struct path path;
@@ -2959,7 +2965,7 @@ static int do_last(struct nameidata *nd,
if (nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
/* we _can_ be in RCU mode here */
- error = lookup_fast(nd, &path, &inode);
+ error = lookup_fast(nd, &path, &inode, &seq);
if (likely(!error))
goto finish_lookup;

@@ -3047,6 +3053,7 @@ retry_lookup:

BUG_ON(nd->flags & LOOKUP_RCU);
inode = path.dentry->d_inode;
+ seq = 0; /* out of RCU mode, so the value doesn't matter */
if (unlikely(d_is_negative(path.dentry))) {
path_to_nameidata(&path, nd);
return -ENOENT;
@@ -3054,7 +3061,7 @@ retry_lookup:
finish_lookup:
if (nd->depth)
put_link(nd);
- error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW);
+ error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW, seq);
if (unlikely(error))
return error;

@@ -3072,6 +3079,7 @@ finish_lookup:

}
nd->inode = inode;
+ nd->seq = seq;
/* Why this, you ask? _Now_ we might have grown LOOKUP_JUMPED... */
finish_open:
error = complete_walk(nd);
--
2.1.4

2015-05-11 18:14:07

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 094/110] namei: store inode in nd->stack[]

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e321450..366b0f3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -507,6 +507,7 @@ struct nameidata {
struct path link;
void *cookie;
const char *name;
+ struct inode *inode;
} *stack, internal[EMBEDDED_LEVELS];
};

@@ -746,7 +747,7 @@ void nd_jump_link(struct path *path)
static inline void put_link(struct nameidata *nd)
{
struct saved *last = nd->stack + --nd->depth;
- struct inode *inode = last->link.dentry->d_inode;
+ struct inode *inode = last->inode;
if (last->cookie && inode->i_op->put_link)
inode->i_op->put_link(last->link.dentry, last->cookie);
path_put(&last->link);
@@ -779,7 +780,7 @@ static inline int may_follow_link(struct nameidata *nd)
return 0;

/* Allowed if owner and follower match. */
- inode = nd->stack[0].link.dentry->d_inode;
+ inode = nd->stack[0].inode;
if (uid_eq(current_cred()->fsuid, inode->i_uid))
return 0;

@@ -870,7 +871,7 @@ const char *get_link(struct nameidata *nd)
{
struct saved *last = nd->stack + nd->depth - 1;
struct dentry *dentry = last->link.dentry;
- struct inode *inode = dentry->d_inode;
+ struct inode *inode = last->inode;
int error;
const char *res;

@@ -1569,6 +1570,7 @@ static int pick_link(struct nameidata *nd, struct path *link, unsigned seq)
last = nd->stack + nd->depth++;
last->link = *link;
last->cookie = NULL;
+ last->inode = d_backing_inode(link->dentry);
return 1;
}

--
2.1.4

2015-05-11 18:14:11

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 095/110] VFS: Handle lower layer dentry/inode in pathwalk

From: David Howells <[email protected]>

Make use of d_backing_inode() in pathwalk to gain access to an
inode or dentry that's on a lower layer.

Signed-off-by: David Howells <[email protected]>
---
fs/namei.c | 10 +++++-----
fs/open.c | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 366b0f3..bcacb31 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1423,7 +1423,7 @@ static int lookup_fast(struct nameidata *nd,
* This sequence count validates that the inode matches
* the dentry name information from lookup.
*/
- *inode = dentry->d_inode;
+ *inode = d_backing_inode(dentry);
negative = d_is_negative(dentry);
if (read_seqcount_retry(&dentry->d_seq, seq))
return -ECHILD;
@@ -1483,7 +1483,7 @@ unlazy:
path->dentry = dentry;
err = follow_managed(path, nd);
if (likely(!err))
- *inode = path->dentry->d_inode;
+ *inode = d_backing_inode(path->dentry);
return err;

need_lookup:
@@ -1618,7 +1618,7 @@ static int walk_component(struct nameidata *nd, int flags)
if (err < 0)
return err;

- inode = path.dentry->d_inode;
+ inode = d_backing_inode(path.dentry);
seq = 0; /* we are already out of RCU mode */
err = -ENOENT;
if (d_is_negative(path.dentry))
@@ -2471,7 +2471,7 @@ EXPORT_SYMBOL(__check_sticky);
*/
static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
{
- struct inode *inode = victim->d_inode;
+ struct inode *inode = d_backing_inode(victim);
int error;

if (d_is_negative(victim))
@@ -3054,7 +3054,7 @@ retry_lookup:
return error;

BUG_ON(nd->flags & LOOKUP_RCU);
- inode = path.dentry->d_inode;
+ inode = d_backing_inode(path.dentry);
seq = 0; /* out of RCU mode, so the value doesn't matter */
if (unlikely(d_is_negative(path.dentry))) {
path_to_nameidata(&path, nd);
diff --git a/fs/open.c b/fs/open.c
index 98e5a52..e0250bd 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -367,7 +367,7 @@ retry:
if (res)
goto out;

- inode = path.dentry->d_inode;
+ inode = d_backing_inode(path.dentry);

if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) {
/*
--
2.1.4

2015-05-11 18:14:01

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 096/110] namei: pick_link() callers already have inode

From: Al Viro <[email protected]>

no need to refetch (and once we move unlazy out of there, recheck ->d_seq).

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bcacb31..33b655d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1545,7 +1545,8 @@ static void terminate_walk(struct nameidata *nd)
put_link(nd);
}

-static int pick_link(struct nameidata *nd, struct path *link, unsigned seq)
+static int pick_link(struct nameidata *nd, struct path *link,
+ struct inode *inode, unsigned seq)
{
int error;
struct saved *last;
@@ -1570,7 +1571,7 @@ static int pick_link(struct nameidata *nd, struct path *link, unsigned seq)
last = nd->stack + nd->depth++;
last->link = *link;
last->cookie = NULL;
- last->inode = d_backing_inode(link->dentry);
+ last->inode = inode;
return 1;
}

@@ -1581,13 +1582,14 @@ static int pick_link(struct nameidata *nd, struct path *link, unsigned seq)
* for the common case.
*/
static inline int should_follow_link(struct nameidata *nd, struct path *link,
- int follow, unsigned seq)
+ int follow,
+ struct inode *inode, unsigned seq)
{
if (likely(!d_is_symlink(link->dentry)))
return 0;
if (!follow)
return 0;
- return pick_link(nd, link, seq);
+ return pick_link(nd, link, inode, seq);
}

enum {WALK_GET = 1, WALK_PUT = 2};
@@ -1627,7 +1629,7 @@ static int walk_component(struct nameidata *nd, int flags)

if (flags & WALK_PUT)
put_link(nd);
- err = should_follow_link(nd, &path, flags & WALK_GET, seq);
+ err = should_follow_link(nd, &path, flags & WALK_GET, inode, seq);
if (unlikely(err))
return err;
path_to_nameidata(&path, nd);
@@ -2349,7 +2351,8 @@ done:
put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
- error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW, 0);
+ error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW,
+ d_backing_inode(dentry), 0);
if (unlikely(error))
return error;
mntget(path->mnt);
@@ -3063,7 +3066,8 @@ retry_lookup:
finish_lookup:
if (nd->depth)
put_link(nd);
- error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW, seq);
+ error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW,
+ inode, seq);
if (unlikely(error))
return error;

--
2.1.4

2015-05-11 18:13:59

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 097/110] security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags()

From: NeilBrown <[email protected]>

This allows MAY_NOT_BLOCK to be passed, in RCU-walk mode, through
the new avc_has_perm_flags() to avc_audit() and thence the slow_avc_audit.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
security/selinux/avc.c | 18 +++++++++++++++++-
security/selinux/hooks.c | 2 +-
security/selinux/include/avc.h | 9 +++++++--
3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 3c17dda..0b122b1 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -761,7 +761,23 @@ int avc_has_perm(u32 ssid, u32 tsid, u16 tclass,

rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, &avd);

- rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
+ rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata, 0);
+ if (rc2)
+ return rc2;
+ return rc;
+}
+
+int avc_has_perm_flags(u32 ssid, u32 tsid, u16 tclass,
+ u32 requested, struct common_audit_data *auditdata,
+ int flags)
+{
+ struct av_decision avd;
+ int rc, rc2;
+
+ rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, &avd);
+
+ rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc,
+ auditdata, flags);
if (rc2)
return rc2;
return rc;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 8016229..d56a829 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1564,7 +1564,7 @@ static int cred_has_capability(const struct cred *cred,

rc = avc_has_perm_noaudit(sid, sid, sclass, av, 0, &avd);
if (audit == SECURITY_CAP_AUDIT) {
- int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad);
+ int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad, 0);
if (rc2)
return rc2;
}
diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index ddf8eec..5973c32 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -130,7 +130,8 @@ static inline int avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd,
int result,
- struct common_audit_data *a)
+ struct common_audit_data *a,
+ int flags)
{
u32 audited, denied;
audited = avc_audit_required(requested, avd, result, 0, &denied);
@@ -138,7 +139,7 @@ static inline int avc_audit(u32 ssid, u32 tsid,
return 0;
return slow_avc_audit(ssid, tsid, tclass,
requested, audited, denied, result,
- a, 0);
+ a, flags);
}

#define AVC_STRICT 1 /* Ignore permissive mode. */
@@ -150,6 +151,10 @@ int avc_has_perm_noaudit(u32 ssid, u32 tsid,
int avc_has_perm(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct common_audit_data *auditdata);
+int avc_has_perm_flags(u32 ssid, u32 tsid,
+ u16 tclass, u32 requested,
+ struct common_audit_data *auditdata,
+ int flags);

u32 avc_policy_seqno(void);

--
2.1.4

2015-05-11 18:12:31

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 098/110] security: make inode_follow_link RCU-walk aware

From: NeilBrown <[email protected]>

inode_follow_link now takes an inode and rcu flag as well as the
dentry.

inode is used in preference to d_backing_inode(dentry), particularly
in RCU-walk mode.

selinux_inode_follow_link() gets dentry_has_perm() and
inode_has_perm() open-coded into it so that it can call
avc_has_perm_flags() in way that is safe if LOOKUP_RCU is set.

Calling avc_has_perm_flags() with rcu_read_lock() held means
that when avc_has_perm_noaudit calls avc_compute_av(), the attempt
to rcu_read_unlock() before calling security_compute_av() will not
actually drop the RCU read-lock.

However as security_compute_av() is completely in a read_lock()ed
region, it should be safe with the RCU read-lock held.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 5 +++--
include/linux/security.h | 12 +++++++++---
security/capability.c | 3 ++-
security/security.c | 7 ++++---
security/selinux/hooks.c | 16 ++++++++++++++--
5 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 33b655d..0fa7af2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -881,8 +881,9 @@ const char *get_link(struct nameidata *nd)

touch_atime(&last->link);

- error = security_inode_follow_link(dentry);
- if (error)
+ error = security_inode_follow_link(dentry, inode,
+ nd->flags & LOOKUP_RCU);
+ if (unlikely(error))
return ERR_PTR(error);

nd->last_type = LAST_BIND;
diff --git a/include/linux/security.h b/include/linux/security.h
index 62a6620..52febde 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -476,6 +476,8 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* @inode_follow_link:
* Check permission to follow a symbolic link when looking up a pathname.
* @dentry contains the dentry structure for the link.
+ * @inode contains the inode, which itself is not stable in RCU-walk
+ * @rcu indicates whether we are in RCU-walk mode.
* Return 0 if permission is granted.
* @inode_permission:
* Check permission before accessing an inode. This hook is called by the
@@ -1551,7 +1553,8 @@ struct security_operations {
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
int (*inode_readlink) (struct dentry *dentry);
- int (*inode_follow_link) (struct dentry *dentry);
+ int (*inode_follow_link) (struct dentry *dentry, struct inode *inode,
+ bool rcu);
int (*inode_permission) (struct inode *inode, int mask);
int (*inode_setattr) (struct dentry *dentry, struct iattr *attr);
int (*inode_getattr) (const struct path *path);
@@ -1837,7 +1840,8 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags);
int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry);
+int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
+ bool rcu);
int security_inode_permission(struct inode *inode, int mask);
int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
int security_inode_getattr(const struct path *path);
@@ -2239,7 +2243,9 @@ static inline int security_inode_readlink(struct dentry *dentry)
return 0;
}

-static inline int security_inode_follow_link(struct dentry *dentry)
+static inline int security_inode_follow_link(struct dentry *dentry,
+ struct inode *inode,
+ bool rcu)
{
return 0;
}
diff --git a/security/capability.c b/security/capability.c
index fae99db..7d3f38f 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -209,7 +209,8 @@ static int cap_inode_readlink(struct dentry *dentry)
return 0;
}

-static int cap_inode_follow_link(struct dentry *dentry)
+static int cap_inode_follow_link(struct dentry *dentry, struct inode *inode,
+ bool rcu)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index d7c30b0..04c8fec 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,12 @@ int security_inode_readlink(struct dentry *dentry)
return security_ops->inode_readlink(dentry);
}

-int security_inode_follow_link(struct dentry *dentry)
+int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
+ bool rcu)
{
- if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
+ if (unlikely(IS_PRIVATE(inode)))
return 0;
- return security_ops->inode_follow_link(dentry);
+ return security_ops->inode_follow_link(dentry, inode, rcu);
}

int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d56a829..ffa5a64 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2861,11 +2861,23 @@ static int selinux_inode_readlink(struct dentry *dentry)
return dentry_has_perm(cred, dentry, FILE__READ);
}

-static int selinux_inode_follow_link(struct dentry *dentry)
+static int selinux_inode_follow_link(struct dentry *dentry, struct inode *inode,
+ bool rcu)
{
const struct cred *cred = current_cred();
+ struct common_audit_data ad;
+ struct inode_security_struct *isec;
+ u32 sid;

- return dentry_has_perm(cred, dentry, FILE__READ);
+ validate_creds(cred);
+
+ ad.type = LSM_AUDIT_DATA_DENTRY;
+ ad.u.dentry = dentry;
+ sid = cred_sid(cred);
+ isec = inode->i_security;
+
+ return avc_has_perm_flags(sid, isec->sid, isec->sclass, FILE__READ, &ad,
+ rcu ? MAY_NOT_BLOCK : 0);
}

static noinline int audit_inode_permission(struct inode *inode,
--
2.1.4

2015-05-11 18:12:23

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 099/110] switch ->put_link() from dentry to inode

From: Al Viro <[email protected]>

only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.

Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/Locking | 2 +-
Documentation/filesystems/vfs.txt | 2 +-
drivers/staging/lustre/lustre/llite/symlink.c | 2 +-
fs/configfs/symlink.c | 2 +-
fs/f2fs/namei.c | 2 +-
fs/fuse/dir.c | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hppfs/hppfs.c | 8 ++++----
fs/kernfs/symlink.c | 2 +-
fs/libfs.c | 2 +-
fs/namei.c | 13 +++++++------
fs/overlayfs/inode.c | 4 ++--
fs/proc/inode.c | 2 +-
include/linux/fs.h | 6 +++---
mm/shmem.c | 2 +-
15 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 5b5b4f5..6a34a0f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -51,7 +51,7 @@ prototypes:
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
const char *(*follow_link) (struct dentry *, void **);
- void (*put_link) (struct dentry *, void *);
+ void (*put_link) (struct inode *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 0dec8c8..542d935 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -351,7 +351,7 @@ struct inode_operations {
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
const char *(*follow_link) (struct dentry *, void **);
- void (*put_link) (struct dentry *, void *);
+ void (*put_link) (struct inode *, void *);
int (*permission) (struct inode *, int);
int (*get_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index f3be3bf..69b2036 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -141,7 +141,7 @@ static const char *ll_follow_link(struct dentry *dentry, void **cookie)
return symname;
}

-static void ll_put_link(struct dentry *dentry, void *cookie)
+static void ll_put_link(struct inode *unused, void *cookie)
{
ptlrpc_req_finished(cookie);
}
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index 0ace756..bc464c2 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -296,7 +296,7 @@ static const char *configfs_follow_link(struct dentry *dentry, void **cookie)
return ERR_PTR(error);
}

-static void configfs_put_link(struct dentry *dentry, void *cookie)
+static void configfs_put_link(struct inode *unused, void *cookie)
{
free_page((unsigned long)cookie);
}
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index cd05a7c..71765d0 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -301,7 +301,7 @@ static const char *f2fs_follow_link(struct dentry *dentry, void **cookie)
const char *link = page_follow_link_light(dentry, cookie);
if (!IS_ERR(link) && !*link) {
/* this is broken symlink case */
- page_put_link(dentry, *cookie);
+ page_put_link(NULL, *cookie);
link = ERR_PTR(-ENOENT);
}
return link;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d5cdef8..9e704c1 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1395,7 +1395,7 @@ static const char *fuse_follow_link(struct dentry *dentry, void **cookie)
return link;
}

-static void fuse_put_link(struct dentry *dentry, void *cookie)
+static void fuse_put_link(struct inode *unused, void *cookie)
{
free_page((unsigned long) cookie);
}
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 7b6ed7a..4a437ab 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -915,7 +915,7 @@ static const char *hostfs_follow_link(struct dentry *dentry, void **cookie)
return *cookie = link;
}

-static void hostfs_put_link(struct dentry *dentry, void *cookie)
+static void hostfs_put_link(struct inode *unused, void *cookie)
{
__putname(cookie);
}
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index 15a774e..2867837 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -649,12 +649,12 @@ static const char *hppfs_follow_link(struct dentry *dentry, void **cookie)
return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, cookie);
}

-static void hppfs_put_link(struct dentry *dentry, void *cookie)
+static void hppfs_put_link(struct inode *inode, void *cookie)
{
- struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
+ struct inode *proc_inode = d_inode(HPPFS_I(inode)->proc_dentry);

- if (d_inode(proc_dentry)->i_op->put_link)
- d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
+ if (proc_inode->i_op->put_link)
+ proc_inode->i_op->put_link(proc_inode, cookie);
}

static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 366c5a1..f6aa2e5 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -126,7 +126,7 @@ static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie)
return *cookie = (char *)page;
}

-static void kernfs_iop_put_link(struct dentry *dentry, void *cookie)
+static void kernfs_iop_put_link(struct inode *unused, void *cookie)
{
free_page((unsigned long)cookie);
}
diff --git a/fs/libfs.c b/fs/libfs.c
index c5f3373..01c337b 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,7 +1024,7 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
}
EXPORT_SYMBOL(noop_fsync);

-void kfree_put_link(struct dentry *dentry, void *cookie)
+void kfree_put_link(struct inode *unused, void *cookie)
{
kfree(cookie);
}
diff --git a/fs/namei.c b/fs/namei.c
index 0fa7af2..4303404 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -749,7 +749,7 @@ static inline void put_link(struct nameidata *nd)
struct saved *last = nd->stack + --nd->depth;
struct inode *inode = last->inode;
if (last->cookie && inode->i_op->put_link)
- inode->i_op->put_link(last->link.dentry, last->cookie);
+ inode->i_op->put_link(inode, last->cookie);
path_put(&last->link);
}

@@ -4444,17 +4444,18 @@ EXPORT_SYMBOL(readlink_copy);
int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
{
void *cookie;
- const char *link = dentry->d_inode->i_link;
+ struct inode *inode = d_inode(dentry);
+ const char *link = inode->i_link;
int res;

if (!link) {
- link = dentry->d_inode->i_op->follow_link(dentry, &cookie);
+ link = inode->i_op->follow_link(dentry, &cookie);
if (IS_ERR(link))
return PTR_ERR(link);
}
res = readlink_copy(buffer, buflen, link);
- if (dentry->d_inode->i_op->put_link)
- dentry->d_inode->i_op->put_link(dentry, cookie);
+ if (inode->i_op->put_link)
+ inode->i_op->put_link(inode, cookie);
return res;
}
EXPORT_SYMBOL(generic_readlink);
@@ -4496,7 +4497,7 @@ const char *page_follow_link_light(struct dentry *dentry, void **cookie)
}
EXPORT_SYMBOL(page_follow_link_light);

-void page_put_link(struct dentry *dentry, void *cookie)
+void page_put_link(struct inode *unused, void *cookie)
{
struct page *page = cookie;
kunmap(page);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 9986833..308379b 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -174,7 +174,7 @@ static const char *ovl_follow_link(struct dentry *dentry, void **cookie)
return ret;
}

-static void ovl_put_link(struct dentry *dentry, void *c)
+static void ovl_put_link(struct inode *unused, void *c)
{
struct inode *realinode;
struct ovl_link_data *data = c;
@@ -183,7 +183,7 @@ static void ovl_put_link(struct dentry *dentry, void *c)
return;

realinode = data->realdentry->d_inode;
- realinode->i_op->put_link(data->realdentry, data->cookie);
+ realinode->i_op->put_link(realinode, data->cookie);
kfree(data);
}

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index eb35874..afe232b 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -402,7 +402,7 @@ static const char *proc_follow_link(struct dentry *dentry, void **cookie)
return pde->data;
}

-static void proc_put_link(struct dentry *dentry, void *p)
+static void proc_put_link(struct inode *unused, void *p)
{
unuse_pde(p);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ed7c9f2..f21e332 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1613,7 +1613,7 @@ struct inode_operations {
struct posix_acl * (*get_acl)(struct inode *, int);

int (*readlink) (struct dentry *, char __user *,int);
- void (*put_link) (struct dentry *, void *);
+ void (*put_link) (struct inode *, void *);

int (*create) (struct inode *,struct dentry *, umode_t, bool);
int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2706,12 +2706,12 @@ extern const struct file_operations generic_ro_fops;
extern int readlink_copy(char __user *, int, const char *);
extern int page_readlink(struct dentry *, char __user *, int);
extern const char *page_follow_link_light(struct dentry *, void **);
-extern void page_put_link(struct dentry *, void *);
+extern void page_put_link(struct inode *, void *);
extern int __page_symlink(struct inode *inode, const char *symname, int len,
int nofs);
extern int page_symlink(struct inode *inode, const char *symname, int len);
extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, void *);
+extern void kfree_put_link(struct inode *, void *);
extern int generic_readlink(struct dentry *, char __user *, int);
extern void generic_fillattr(struct inode *, struct kstat *);
int vfs_getattr_nosec(struct path *path, struct kstat *stat);
diff --git a/mm/shmem.c b/mm/shmem.c
index e026822..a59087e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2486,7 +2486,7 @@ static const char *shmem_follow_link(struct dentry *dentry, void **cookie)
return kmap(page);
}

-static void shmem_put_link(struct dentry *dentry, void *cookie)
+static void shmem_put_link(struct inode *unused, void *cookie)
{
struct page *page = cookie;
kunmap(page);
--
2.1.4

2015-05-11 18:12:26

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 100/110] new helper: free_page_put_link()

From: Al Viro <[email protected]>

similar to kfree_put_link()

Signed-off-by: Al Viro <[email protected]>
---
fs/configfs/symlink.c | 7 +------
fs/fuse/dir.c | 7 +------
fs/kernfs/symlink.c | 7 +------
fs/libfs.c | 6 ++++++
include/linux/fs.h | 1 +
5 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index bc464c2..ec5c832 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -296,15 +296,10 @@ static const char *configfs_follow_link(struct dentry *dentry, void **cookie)
return ERR_PTR(error);
}

-static void configfs_put_link(struct inode *unused, void *cookie)
-{
- free_page((unsigned long)cookie);
-}
-
const struct inode_operations configfs_symlink_inode_operations = {
.follow_link = configfs_follow_link,
.readlink = generic_readlink,
- .put_link = configfs_put_link,
+ .put_link = free_page_put_link,
.setattr = configfs_setattr,
};

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 9e704c1..5e2e087 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1395,11 +1395,6 @@ static const char *fuse_follow_link(struct dentry *dentry, void **cookie)
return link;
}

-static void fuse_put_link(struct inode *unused, void *cookie)
-{
- free_page((unsigned long) cookie);
-}
-
static int fuse_dir_open(struct inode *inode, struct file *file)
{
return fuse_open_common(inode, file, true);
@@ -1915,7 +1910,7 @@ static const struct inode_operations fuse_common_inode_operations = {
static const struct inode_operations fuse_symlink_inode_operations = {
.setattr = fuse_setattr,
.follow_link = fuse_follow_link,
- .put_link = fuse_put_link,
+ .put_link = free_page_put_link,
.readlink = generic_readlink,
.getattr = fuse_getattr,
.setxattr = fuse_setxattr,
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index f6aa2e5..db27252 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -126,11 +126,6 @@ static const char *kernfs_iop_follow_link(struct dentry *dentry, void **cookie)
return *cookie = (char *)page;
}

-static void kernfs_iop_put_link(struct inode *unused, void *cookie)
-{
- free_page((unsigned long)cookie);
-}
-
const struct inode_operations kernfs_symlink_iops = {
.setxattr = kernfs_iop_setxattr,
.removexattr = kernfs_iop_removexattr,
@@ -138,7 +133,7 @@ const struct inode_operations kernfs_symlink_iops = {
.listxattr = kernfs_iop_listxattr,
.readlink = generic_readlink,
.follow_link = kernfs_iop_follow_link,
- .put_link = kernfs_iop_put_link,
+ .put_link = free_page_put_link,
.setattr = kernfs_iop_setattr,
.getattr = kernfs_iop_getattr,
.permission = kernfs_iop_permission,
diff --git a/fs/libfs.c b/fs/libfs.c
index 01c337b..65e1fec 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1030,6 +1030,12 @@ void kfree_put_link(struct inode *unused, void *cookie)
}
EXPORT_SYMBOL(kfree_put_link);

+void free_page_put_link(struct inode *unused, void *cookie)
+{
+ free_page((unsigned long) cookie);
+}
+EXPORT_SYMBOL(free_page_put_link);
+
/*
* nop .set_page_dirty method so that people can use .page_mkwrite on
* anon inodes.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f21e332..8f73851 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2712,6 +2712,7 @@ extern int __page_symlink(struct inode *inode, const char *symname, int len,
extern int page_symlink(struct inode *inode, const char *symname, int len);
extern const struct inode_operations page_symlink_inode_operations;
extern void kfree_put_link(struct inode *, void *);
+extern void free_page_put_link(struct inode *, void *);
extern int generic_readlink(struct dentry *, char __user *, int);
extern void generic_fillattr(struct inode *, struct kstat *);
int vfs_getattr_nosec(struct path *path, struct kstat *stat);
--
2.1.4

2015-05-11 18:12:21

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 101/110] namei: make put_link() RCU-safe

From: Al Viro <[email protected]>

very simple - just make path_put() conditional on !RCU.
Note that right now it doesn't get called in RCU mode -
we leave it before getting anything into stack.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4303404..998c3c2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -750,7 +750,8 @@ static inline void put_link(struct nameidata *nd)
struct inode *inode = last->inode;
if (last->cookie && inode->i_op->put_link)
inode->i_op->put_link(inode, last->cookie);
- path_put(&last->link);
+ if (!(nd->flags & LOOKUP_RCU))
+ path_put(&last->link);
}

int sysctl_protected_symlinks __read_mostly = 0;
--
2.1.4

2015-05-11 18:12:18

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 102/110] namei: make may_follow_link() safe in RCU mode

From: Al Viro <[email protected]>

We *can't* call that audit garbage in RCU mode - it's doing a weird
mix of allocations (GFP_NOFS, immediately followed by GFP_KERNEL)
and I'm not touching that... thing again.

So if this security sclero^Whardening feature gets triggered when
we are in RCU mode, tough - we'll fail with -ECHILD and have
everything restarted in non-RCU mode. Only to hit the same test
and fail, this time with EACCES and with (oh, rapture) an audit spew
produced.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 998c3c2..20bf494 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -794,6 +794,9 @@ static inline int may_follow_link(struct nameidata *nd)
if (uid_eq(parent->i_uid, inode->i_uid))
return 0;

+ if (nd->flags & LOOKUP_RCU)
+ return -ECHILD;
+
audit_log_link_denied("follow_link", &nd->stack[0].link);
return -EACCES;
}
--
2.1.4

2015-05-11 18:12:13

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 103/110] new helper: __legitimize_mnt()

From: Al Viro <[email protected]>

same as legitimize_mnt(), except that it does *not* drop and regain
rcu_read_lock; return values are
0 => grabbed a reference, we are fine
1 => failed, just go away
-1 => failed, go away and mntput(bastard) when outside of rcu_read_lock

Signed-off-by: Al Viro <[email protected]>
---
fs/mount.h | 1 +
fs/namespace.c | 27 +++++++++++++++++++--------
2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 6a61c2b..b5b8082 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -88,6 +88,7 @@ static inline int is_mounted(struct vfsmount *mnt)
extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *);
extern struct mount *__lookup_mnt_last(struct vfsmount *, struct dentry *);

+extern int __legitimize_mnt(struct vfsmount *, unsigned);
extern bool legitimize_mnt(struct vfsmount *, unsigned);

extern void __detach_mounts(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index 1b9e111..9c1c43d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -590,24 +590,35 @@ static void delayed_free_vfsmnt(struct rcu_head *head)
}

/* call under rcu_read_lock */
-bool legitimize_mnt(struct vfsmount *bastard, unsigned seq)
+int __legitimize_mnt(struct vfsmount *bastard, unsigned seq)
{
struct mount *mnt;
if (read_seqretry(&mount_lock, seq))
- return false;
+ return 1;
if (bastard == NULL)
- return true;
+ return 0;
mnt = real_mount(bastard);
mnt_add_count(mnt, 1);
if (likely(!read_seqretry(&mount_lock, seq)))
- return true;
+ return 0;
if (bastard->mnt_flags & MNT_SYNC_UMOUNT) {
mnt_add_count(mnt, -1);
- return false;
+ return 1;
+ }
+ return -1;
+}
+
+/* call under rcu_read_lock */
+bool legitimize_mnt(struct vfsmount *bastard, unsigned seq)
+{
+ int res = __legitimize_mnt(bastard, seq);
+ if (likely(!res))
+ return true;
+ if (unlikely(res < 0)) {
+ rcu_read_unlock();
+ mntput(bastard);
+ rcu_read_lock();
}
- rcu_read_unlock();
- mntput(bastard);
- rcu_read_lock();
return false;
}

--
2.1.4

2015-05-11 18:11:12

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 104/110] namei: store seq numbers in nd->stack[]

From: Al Viro <[email protected]>

we'll need them for unlazy_walk()

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 20bf494..92bf031 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -508,6 +508,7 @@ struct nameidata {
void *cookie;
const char *name;
struct inode *inode;
+ unsigned seq;
} *stack, internal[EMBEDDED_LEVELS];
};

@@ -1577,6 +1578,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
last->link = *link;
last->cookie = NULL;
last->inode = inode;
+ last->seq = seq;
return 1;
}

--
2.1.4

2015-05-11 18:11:01

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 105/110] namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

From: Al Viro <[email protected]>

We are almost done - primitives for leaving RCU mode are aware of nd->stack
now, a new primitive for going to non-RCU mode when we have a symlink on hands
added.

The thing we are heavily relying upon is that *any* unlazy failure will be
shortly followed by terminate_walk(), with no access to nameidata in between.
So it's enough to leave the things in a state terminate_walk() would cope with.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 88 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 92bf031..090214b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -554,6 +554,50 @@ static inline int nd_alloc_stack(struct nameidata *nd)
return __nd_alloc_stack(nd);
}

+static void drop_links(struct nameidata *nd)
+{
+ int i = nd->depth;
+ while (i--) {
+ struct saved *last = nd->stack + i;
+ struct inode *inode = last->inode;
+ if (last->cookie && inode->i_op->put_link) {
+ inode->i_op->put_link(inode, last->cookie);
+ last->cookie = NULL;
+ }
+ }
+}
+
+static bool legitimize_path(struct nameidata *nd,
+ struct path *path, unsigned seq)
+{
+ int res = __legitimize_mnt(path->mnt, nd->m_seq);
+ if (unlikely(res)) {
+ if (res > 0)
+ path->mnt = NULL;
+ path->dentry = NULL;
+ return false;
+ }
+ if (unlikely(!lockref_get_not_dead(&path->dentry->d_lockref))) {
+ path->dentry = NULL;
+ return false;
+ }
+ return !read_seqcount_retry(&path->dentry->d_seq, seq);
+}
+
+static bool legitimize_links(struct nameidata *nd)
+{
+ int i;
+ for (i = 0; i < nd->depth; i++) {
+ struct saved *last = nd->stack + i;
+ if (unlikely(!legitimize_path(nd, &last->link, last->seq))) {
+ drop_links(nd);
+ nd->depth = i;
+ return false;
+ }
+ }
+ return true;
+}
+
/*
* Path walking has 2 modes, rcu-walk and ref-walk (see
* Documentation/filesystems/path-lookup.txt). In situations when we can't
@@ -575,25 +619,33 @@ static inline int nd_alloc_stack(struct nameidata *nd)
* unlazy_walk attempts to legitimize the current nd->path, nd->root and dentry
* for ref-walk mode. @dentry must be a path found by a do_lookup call on
* @nd or NULL. Must be called from rcu-walk context.
+ * Nothing should touch nameidata between unlazy_walk() failure and
+ * terminate_walk().
*/
static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq)
{
struct fs_struct *fs = current->fs;
struct dentry *parent = nd->path.dentry;
+ int res;

BUG_ON(!(nd->flags & LOOKUP_RCU));

- /*
- * After legitimizing the bastards, terminate_walk()
- * will do the right thing for non-RCU mode, and all our
- * subsequent exit cases should rcu_read_unlock()
- * before returning. Do vfsmount first; if dentry
- * can't be legitimized, just set nd->path.dentry to NULL
- * and rely on dput(NULL) being a no-op.
- */
- if (!legitimize_mnt(nd->path.mnt, nd->m_seq))
- return -ECHILD;
nd->flags &= ~LOOKUP_RCU;
+ if (unlikely(!legitimize_links(nd))) {
+ rcu_read_unlock();
+ nd->path.mnt = NULL;
+ nd->path.dentry = NULL;
+ goto drop_root_mnt;
+ }
+ res = __legitimize_mnt(nd->path.mnt, nd->m_seq);
+ if (unlikely(res)) {
+ rcu_read_unlock();
+ if (res < 0)
+ mntput(nd->path.mnt);
+ nd->path.mnt = NULL;
+ nd->path.dentry = NULL;
+ goto drop_root_mnt;
+ }

if (!lockref_get_not_dead(&parent->d_lockref)) {
nd->path.dentry = NULL;
@@ -651,6 +703,23 @@ drop_root_mnt:
return -ECHILD;
}

+static int unlazy_link(struct nameidata *nd, struct path *link, unsigned seq)
+{
+ if (unlikely(!legitimize_path(nd, link, seq))) {
+ drop_links(nd);
+ rcu_read_unlock();
+ nd->flags &= ~LOOKUP_RCU;
+ nd->path.mnt = NULL;
+ nd->path.dentry = NULL;
+ if (!(nd->flags & LOOKUP_ROOT))
+ nd->root.mnt = NULL;
+ } else if (likely(unlazy_walk(nd, NULL, 0)) == 0) {
+ return 0;
+ }
+ path_put(link);
+ return -ECHILD;
+}
+
static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
{
return dentry->d_op->d_revalidate(dentry, flags);
@@ -1539,16 +1608,19 @@ static inline int handle_dots(struct nameidata *nd, int type)

static void terminate_walk(struct nameidata *nd)
{
+ drop_links(nd);
if (!(nd->flags & LOOKUP_RCU)) {
+ int i;
path_put(&nd->path);
+ for (i = 0; i < nd->depth; i++)
+ path_put(&nd->stack[i].link);
} else {
nd->flags &= ~LOOKUP_RCU;
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
rcu_read_unlock();
}
- while (unlikely(nd->depth))
- put_link(nd);
+ nd->depth = 0;
}

static int pick_link(struct nameidata *nd, struct path *link,
@@ -1561,13 +1633,12 @@ static int pick_link(struct nameidata *nd, struct path *link,
return -ELOOP;
}
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != link->mnt ||
- unlazy_walk(nd, link->dentry, seq))) {
+ if (unlikely(unlazy_link(nd, link, seq)))
return -ECHILD;
- }
+ } else {
+ if (link->mnt == nd->path.mnt)
+ mntget(link->mnt);
}
- if (link->mnt == nd->path.mnt)
- mntget(link->mnt);
error = nd_alloc_stack(nd);
if (unlikely(error)) {
path_put(link);
--
2.1.4

2015-05-11 18:11:07

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 106/110] namei: don't unlazy until get_link()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 090214b..4d6065d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -536,10 +536,19 @@ static void restore_nameidata(struct nameidata *old)

static int __nd_alloc_stack(struct nameidata *nd)
{
- struct saved *p = kmalloc(MAXSYMLINKS * sizeof(struct saved),
+ struct saved *p;
+
+ if (nd->flags & LOOKUP_RCU) {
+ p= kmalloc(MAXSYMLINKS * sizeof(struct saved),
+ GFP_ATOMIC);
+ if (unlikely(!p))
+ return -ECHILD;
+ } else {
+ p= kmalloc(MAXSYMLINKS * sizeof(struct saved),
GFP_KERNEL);
- if (unlikely(!p))
- return -ENOMEM;
+ if (unlikely(!p))
+ return -ENOMEM;
+ }
memcpy(p, nd->internal, sizeof(nd->internal));
nd->stack = p;
return 0;
@@ -949,8 +958,10 @@ const char *get_link(struct nameidata *nd)
int error;
const char *res;

- BUG_ON(nd->flags & LOOKUP_RCU);
-
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(unlazy_walk(nd, NULL, 0)))
+ return ERR_PTR(-ECHILD);
+ }
cond_resched();

touch_atime(&last->link);
@@ -1632,17 +1643,21 @@ static int pick_link(struct nameidata *nd, struct path *link,
path_to_nameidata(link, nd);
return -ELOOP;
}
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(unlazy_link(nd, link, seq)))
- return -ECHILD;
- } else {
+ if (!(nd->flags & LOOKUP_RCU)) {
if (link->mnt == nd->path.mnt)
mntget(link->mnt);
}
error = nd_alloc_stack(nd);
if (unlikely(error)) {
- path_put(link);
- return error;
+ if (error == -ECHILD) {
+ if (unlikely(unlazy_link(nd, link, seq)))
+ return -ECHILD;
+ error = nd_alloc_stack(nd);
+ }
+ if (error) {
+ path_put(link);
+ return error;
+ }
}

last = nd->stack + nd->depth++;
--
2.1.4

2015-05-11 18:10:59

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 107/110] VFS/namei: make the use of touch_atime() in get_link() RCU-safe.

From: NeilBrown <[email protected]>

touch_atime is not RCU-safe, and so cannot be called on an RCU walk.
However, in situations where RCU-walk makes a difference, the symlink
will likely to accessed much more often than it is useful to update
the atime.

So split out the test of "Does the atime actually need to be updated"
into atime_needs_update(), and have get_link() unlazy if it finds that
it will need to do that update.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
fs/inode.c | 30 +++++++++++++++++++++---------
fs/namei.c | 12 +++++++++---
include/linux/fs.h | 1 +
3 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 952fb48..e8d6268 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1585,36 +1585,47 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
* This function automatically handles read only file systems and media,
* as well as the "noatime" flag and inode specific "noatime" markers.
*/
-void touch_atime(const struct path *path)
+bool atime_needs_update(const struct path *path, struct inode *inode)
{
struct vfsmount *mnt = path->mnt;
- struct inode *inode = d_inode(path->dentry);
struct timespec now;

if (inode->i_flags & S_NOATIME)
- return;
+ return false;
if (IS_NOATIME(inode))
- return;
+ return false;
if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
- return;
+ return false;

if (mnt->mnt_flags & MNT_NOATIME)
- return;
+ return false;
if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
- return;
+ return false;

now = current_fs_time(inode->i_sb);

if (!relatime_need_update(mnt, inode, now))
- return;
+ return false;

if (timespec_equal(&inode->i_atime, &now))
+ return false;
+
+ return true;
+}
+
+void touch_atime(const struct path *path)
+{
+ struct vfsmount *mnt = path->mnt;
+ struct inode *inode = d_inode(path->dentry);
+ struct timespec now;
+
+ if (!atime_needs_update(path, inode))
return;

if (!sb_start_write_trylock(inode->i_sb))
return;

- if (__mnt_want_write(mnt))
+ if (__mnt_want_write(mnt) != 0)
goto skip_update;
/*
* File systems can error out when updating inodes if they need to
@@ -1625,6 +1636,7 @@ void touch_atime(const struct path *path)
* We may also fail on filesystems that have the ability to make parts
* of the fs read only, e.g. subvolumes in Btrfs.
*/
+ now = current_fs_time(inode->i_sb);
update_time(inode, &now, S_ATIME);
__mnt_drop_write(mnt);
skip_update:
diff --git a/fs/namei.c b/fs/namei.c
index 4d6065d..9ef8c50 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -958,13 +958,19 @@ const char *get_link(struct nameidata *nd)
int error;
const char *res;

- if (nd->flags & LOOKUP_RCU) {
+ if (!(nd->flags & LOOKUP_RCU)) {
+ touch_atime(&last->link);
+ } else if (atime_needs_update(&last->link, inode)) {
if (unlikely(unlazy_walk(nd, NULL, 0)))
return ERR_PTR(-ECHILD);
+ touch_atime(&last->link);
}
- cond_resched();

- touch_atime(&last->link);
+ _cond_resched();
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(unlazy_walk(nd, NULL, 0)))
+ return ERR_PTR(-ECHILD);
+ }

error = security_inode_follow_link(dentry, inode,
nd->flags & LOOKUP_RCU);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8f73851..1426c43 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1880,6 +1880,7 @@ enum file_time_flags {
S_VERSION = 8,
};

+extern bool atime_needs_update(const struct path *, struct inode *);
extern void touch_atime(const struct path *);
static inline void file_accessed(struct file *file)
{
--
2.1.4

2015-05-11 18:09:22

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 108/110] enable passing fast relative symlinks without dropping out of RCU mode

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9ef8c50..1771143 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -967,10 +967,6 @@ const char *get_link(struct nameidata *nd)
}

_cond_resched();
- if (nd->flags & LOOKUP_RCU) {
- if (unlikely(unlazy_walk(nd, NULL, 0)))
- return ERR_PTR(-ECHILD);
- }

error = security_inode_follow_link(dentry, inode,
nd->flags & LOOKUP_RCU);
@@ -980,6 +976,10 @@ const char *get_link(struct nameidata *nd)
nd->last_type = LAST_BIND;
res = inode->i_link;
if (!res) {
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(unlazy_walk(nd, NULL, 0)))
+ return ERR_PTR(-ECHILD);
+ }
res = inode->i_op->follow_link(dentry, &last->cookie);
if (IS_ERR_OR_NULL(res)) {
last->cookie = NULL;
@@ -987,6 +987,10 @@ const char *get_link(struct nameidata *nd)
}
}
if (*res == '/') {
+ if (nd->flags & LOOKUP_RCU) {
+ if (unlikely(unlazy_walk(nd, NULL, 0)))
+ return ERR_PTR(-ECHILD);
+ }
if (!nd->root.mnt)
set_root(nd);
path_put(&nd->path);
--
2.1.4

2015-05-11 18:09:25

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 109/110] namei: handle absolute symlinks without dropping out of RCU mode

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1771143..fc8df8b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -499,7 +499,7 @@ struct nameidata {
struct path root;
struct inode *inode; /* path.dentry.d_inode */
unsigned int flags;
- unsigned seq, m_seq;
+ unsigned seq, m_seq, root_seq;
int last_type;
unsigned depth;
int total_link_count;
@@ -780,14 +780,14 @@ static __always_inline void set_root(struct nameidata *nd)
static __always_inline unsigned set_root_rcu(struct nameidata *nd)
{
struct fs_struct *fs = current->fs;
- unsigned seq, res;
+ unsigned seq;

do {
seq = read_seqcount_begin(&fs->seq);
nd->root = fs->root;
- res = __read_seqcount_begin(&nd->root.dentry->d_seq);
+ nd->root_seq = __read_seqcount_begin(&nd->root.dentry->d_seq);
} while (read_seqcount_retry(&fs->seq, seq));
- return res;
+ return nd->root_seq;
}

static void path_put_conditional(struct path *path, struct nameidata *nd)
@@ -988,15 +988,23 @@ const char *get_link(struct nameidata *nd)
}
if (*res == '/') {
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(unlazy_walk(nd, NULL, 0)))
+ struct dentry *d;
+ if (!nd->root.mnt)
+ set_root_rcu(nd);
+ nd->path = nd->root;
+ d = nd->path.dentry;
+ nd->inode = d->d_inode;
+ nd->seq = nd->root_seq;
+ if (unlikely(read_seqcount_retry(&d->d_seq, nd->seq)))
return ERR_PTR(-ECHILD);
+ } else {
+ if (!nd->root.mnt)
+ set_root(nd);
+ path_put(&nd->path);
+ nd->path = nd->root;
+ path_get(&nd->root);
+ nd->inode = nd->path.dentry->d_inode;
}
- if (!nd->root.mnt)
- set_root(nd);
- path_put(&nd->path);
- nd->path = nd->root;
- path_get(&nd->root);
- nd->inode = nd->path.dentry->d_inode;
nd->flags |= LOOKUP_JUMPED;
while (unlikely(*++res == '/'))
;
--
2.1.4

2015-05-11 18:09:18

by Al Viro

[permalink] [raw]
Subject: [PATCH v3 110/110] update Documentation/filesystems/ regarding the follow_link/put_link changes

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/porting | 17 +++++++++++++++++
Documentation/filesystems/vfs.txt | 18 ++++++++++--------
2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index e69274d..3eae250 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -483,3 +483,20 @@ in your dentry operations instead.
--
[mandatory]
->aio_read/->aio_write are gone. Use ->read_iter/->write_iter.
+---
+[recommended]
+ for embedded ("fast") symlinks just set inode->i_link to wherever the
+ symlink body is and use simple_follow_link() as ->follow_link().
+--
+[mandatory]
+ calling conventions for ->follow_link() have changed. Instead of returning
+ cookie and using nd_set_link() to store the body to traverse, we return
+ the body to traverse and store the cookie using explicit void ** argument.
+ nameidata isn't passed at all - nd_jump_link() doesn't need it and
+ nd_[gs]et_link() is gone.
+--
+[mandatory]
+ calling conventions for ->put_link() have changed. It gets inode instead of
+ dentry, it does not get nameidata at all and it gets called only when cookie
+ is non-NULL. Note that link body isn't available anymore, so if you need it,
+ store it as cookie.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 542d935..b403b29 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -436,16 +436,18 @@ otherwise noted.

follow_link: called by the VFS to follow a symbolic link to the
inode it points to. Only required if you want to support
- symbolic links. This method returns a void pointer cookie
- that is passed to put_link().
+ symbolic links. This method returns the symlink body
+ to traverse (and possibly resets the current position with
+ nd_jump_link()). If the body won't go away until the inode
+ is gone, nothing else is needed; if it needs to be otherwise
+ pinned, the data needed to release whatever we'd grabbed
+ is to be stored in void * variable passed by address to
+ follow_link() instance.

put_link: called by the VFS to release resources allocated by
- follow_link(). The cookie returned by follow_link() is passed
- to this method as the last parameter. It is used by
- filesystems such as NFS where page cache is not stable
- (i.e. page that was installed when the symbolic link walk
- started might not be in the page cache at the end of the
- walk).
+ follow_link(). The cookie stored by follow_link() is passed
+ to this method as the last parameter; only called when
+ cookie isn't NULL.

permission: called by the VFS to check for access rights on a POSIX-like
filesystem.
--
2.1.4

2015-05-11 18:20:43

by Linus Torvalds

[permalink] [raw]

2015-05-11 23:43:37

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3 105/110] namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

On Mon, May 11, 2015 at 07:08:05PM +0100, Al Viro wrote:
> +static bool legitimize_links(struct nameidata *nd)
> +{
> + int i;
> + for (i = 0; i < nd->depth; i++) {
> + struct saved *last = nd->stack + i;
> + if (unlikely(!legitimize_path(nd, &last->link, last->seq))) {
> + drop_links(nd);
> + nd->depth = i;

Broken, actually - it should be i + 1. What happens is that we attempt to
grab references on nd->stack[...].link; if everything succeeds, we'd won.
If legitimizing nd->stack[i].link fails (e.g. ->d_seq has changed on us),
we
* put_link everything in stack and clear nd->stack[...].cookie, making
sure that nobody will call ->put_link() on it later.
* leave the things for terminate_walk() so that it would do
path_put() on everything we have grabbed and ignored everything we hadn't
even got around to.

But this failed legitimize_path() requires path_put() - we *can't* block
there (we wouldn't be able to do ->put_link() afterwards if we did), so
we just zero what we didn't grab and leave what we had for subsequent
path_put(). Which may be anything from "nothing" (mount_lock has been
touched) to "both vfsmount and dentry" (->d_seq mismatch).

So we need to set nd->depth to i + 1 here, not i. As it is, we are risking
a vfsmount (and possibly dentry) leak. Fixed and force-pushed...

2015-05-12 04:10:08

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3 105/110] namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

On Tue, May 12, 2015 at 12:43:33AM +0100, Al Viro wrote:
> On Mon, May 11, 2015 at 07:08:05PM +0100, Al Viro wrote:
> > +static bool legitimize_links(struct nameidata *nd)
> > +{
> > + int i;
> > + for (i = 0; i < nd->depth; i++) {
> > + struct saved *last = nd->stack + i;
> > + if (unlikely(!legitimize_path(nd, &last->link, last->seq))) {
> > + drop_links(nd);
> > + nd->depth = i;
>
> Broken, actually - it should be i + 1. What happens is that we attempt to
> grab references on nd->stack[...].link; if everything succeeds, we'd won.
> If legitimizing nd->stack[i].link fails (e.g. ->d_seq has changed on us),
> we
> * put_link everything in stack and clear nd->stack[...].cookie, making
> sure that nobody will call ->put_link() on it later.
> * leave the things for terminate_walk() so that it would do
> path_put() on everything we have grabbed and ignored everything we hadn't
> even got around to.
>
> But this failed legitimize_path() requires path_put() - we *can't* block
> there (we wouldn't be able to do ->put_link() afterwards if we did), so
> we just zero what we didn't grab and leave what we had for subsequent
> path_put(). Which may be anything from "nothing" (mount_lock has been
> touched) to "both vfsmount and dentry" (->d_seq mismatch).
>
> So we need to set nd->depth to i + 1 here, not i. As it is, we are risking
> a vfsmount (and possibly dentry) leak. Fixed and force-pushed...

FWIW, below is a better replacement; tested and force-pushed. And seeing that
we just got nd->root_seq, I wonder if we really need messing with
current->fs there - something like
if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) {
if (unlikely(!legitimize_path(&nd->root, nd->root_seq))) {
rcu_read_unlock();
dput(dentry);
return -ECHILD;
}
}
should do better than playing with fs->lock, etc. we do right now...

diff --git a/fs/namei.c b/fs/namei.c
index 92bf031..6db14f2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -554,6 +554,68 @@ static inline int nd_alloc_stack(struct nameidata *nd)
return __nd_alloc_stack(nd);
}

+static void drop_links(struct nameidata *nd)
+{
+ int i = nd->depth;
+ while (i--) {
+ struct saved *last = nd->stack + i;
+ struct inode *inode = last->inode;
+ if (last->cookie && inode->i_op->put_link) {
+ inode->i_op->put_link(inode, last->cookie);
+ last->cookie = NULL;
+ }
+ }
+}
+
+static void terminate_walk(struct nameidata *nd)
+{
+ drop_links(nd);
+ if (!(nd->flags & LOOKUP_RCU)) {
+ int i;
+ path_put(&nd->path);
+ for (i = 0; i < nd->depth; i++)
+ path_put(&nd->stack[i].link);
+ } else {
+ nd->flags &= ~LOOKUP_RCU;
+ if (!(nd->flags & LOOKUP_ROOT))
+ nd->root.mnt = NULL;
+ rcu_read_unlock();
+ }
+ nd->depth = 0;
+}
+
+/* path_put is needed afterwards regardless of success or failure */
+static bool legitimize_path(struct nameidata *nd,
+ struct path *path, unsigned seq)
+{
+ int res = __legitimize_mnt(path->mnt, nd->m_seq);
+ if (unlikely(res)) {
+ if (res > 0)
+ path->mnt = NULL;
+ path->dentry = NULL;
+ return false;
+ }
+ if (unlikely(!lockref_get_not_dead(&path->dentry->d_lockref))) {
+ path->dentry = NULL;
+ return false;
+ }
+ return !read_seqcount_retry(&path->dentry->d_seq, seq);
+}
+
+static bool legitimize_links(struct nameidata *nd)
+{
+ int i;
+ for (i = 0; i < nd->depth; i++) {
+ struct saved *last = nd->stack + i;
+ if (unlikely(!legitimize_path(nd, &last->link, last->seq))) {
+ drop_links(nd);
+ nd->depth = i + 1;
+ return false;
+ }
+ }
+ return true;
+}
+
/*
* Path walking has 2 modes, rcu-walk and ref-walk (see
* Documentation/filesystems/path-lookup.txt). In situations when we can't
@@ -575,6 +637,8 @@ static inline int nd_alloc_stack(struct nameidata *nd)
* unlazy_walk attempts to legitimize the current nd->path, nd->root and dentry
* for ref-walk mode. @dentry must be a path found by a do_lookup call on
* @nd or NULL. Must be called from rcu-walk context.
+ * Nothing should touch nameidata between unlazy_walk() failure and
+ * terminate_walk().
*/
static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq)
{
@@ -583,22 +647,13 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq

BUG_ON(!(nd->flags & LOOKUP_RCU));

- /*
- * After legitimizing the bastards, terminate_walk()
- * will do the right thing for non-RCU mode, and all our
- * subsequent exit cases should rcu_read_unlock()
- * before returning. Do vfsmount first; if dentry
- * can't be legitimized, just set nd->path.dentry to NULL
- * and rely on dput(NULL) being a no-op.
- */
- if (!legitimize_mnt(nd->path.mnt, nd->m_seq))
- return -ECHILD;
nd->flags &= ~LOOKUP_RCU;
-
- if (!lockref_get_not_dead(&parent->d_lockref)) {
- nd->path.dentry = NULL;
- goto out;
- }
+ if (unlikely(!legitimize_links(nd)))
+ goto out2;
+ if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq)))
+ goto out2;
+ if (unlikely(!lockref_get_not_dead(&parent->d_lockref)))
+ goto out1;

/*
* For a negative lookup, the lookup sequence point is the parents
@@ -628,8 +683,10 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq
*/
if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) {
spin_lock(&fs->lock);
- if (nd->root.mnt != fs->root.mnt || nd->root.dentry != fs->root.dentry)
- goto unlock_and_drop_dentry;
+ if (unlikely(!path_equal(&nd->root, &fs->root))) {
+ spin_unlock(&fs->lock);
+ goto drop_dentry;
+ }
path_get(&nd->root);
spin_unlock(&fs->lock);
}
@@ -637,12 +694,14 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq
rcu_read_unlock();
return 0;

-unlock_and_drop_dentry:
- spin_unlock(&fs->lock);
drop_dentry:
rcu_read_unlock();
dput(dentry);
goto drop_root_mnt;
+out2:
+ nd->path.mnt = NULL;
+out1:
+ nd->path.dentry = NULL;
out:
rcu_read_unlock();
drop_root_mnt:
@@ -651,6 +710,23 @@ drop_root_mnt:
return -ECHILD;
}

+static int unlazy_link(struct nameidata *nd, struct path *link, unsigned seq)
+{
+ if (unlikely(!legitimize_path(nd, link, seq))) {
+ drop_links(nd);
+ rcu_read_unlock();
+ nd->flags &= ~LOOKUP_RCU;
+ nd->path.mnt = NULL;
+ nd->path.dentry = NULL;
+ if (!(nd->flags & LOOKUP_ROOT))
+ nd->root.mnt = NULL;
+ } else if (likely(unlazy_walk(nd, NULL, 0)) == 0) {
+ return 0;
+ }
+ path_put(link);
+ return -ECHILD;
+}
+
static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
{
return dentry->d_op->d_revalidate(dentry, flags);
@@ -1537,20 +1613,6 @@ static inline int handle_dots(struct nameidata *nd, int type)
return 0;
}

-static void terminate_walk(struct nameidata *nd)
-{
- if (!(nd->flags & LOOKUP_RCU)) {
- path_put(&nd->path);
- } else {
- nd->flags &= ~LOOKUP_RCU;
- if (!(nd->flags & LOOKUP_ROOT))
- nd->root.mnt = NULL;
- rcu_read_unlock();
- }
- while (unlikely(nd->depth))
- put_link(nd);
-}
-
static int pick_link(struct nameidata *nd, struct path *link,
struct inode *inode, unsigned seq)
{
@@ -1561,13 +1623,12 @@ static int pick_link(struct nameidata *nd, struct path *link,
return -ELOOP;
}
if (nd->flags & LOOKUP_RCU) {
- if (unlikely(nd->path.mnt != link->mnt ||
- unlazy_walk(nd, link->dentry, seq))) {
+ if (unlikely(unlazy_link(nd, link, seq)))
return -ECHILD;
- }
+ } else {
+ if (link->mnt == nd->path.mnt)
+ mntget(link->mnt);
}
- if (link->mnt == nd->path.mnt)
- mntget(link->mnt);
error = nd_alloc_stack(nd);
if (unlikely(error)) {
path_put(link);

2015-05-13 22:25:36

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

More on top of the current vfs.git#for-next (== the posted patchset
with a couple of fixes): more fs/namei.c reorganization and stack footprint
reduction (below 1Kb now). One interesting piece of that is that we don't
touch current->fs->lock anymore - unlazy_walk() used to, but now we can
get rid of that.

FWIW, at that point I'm starting to seriously look into a primitive
that would take the usual dfd+name+flags and (path x inode x bool -> int)
callback (since we don't have closures, it'd have to be
int filename_apply(int dfd, struct filename *name, unsigned flags,
int (*act)(struct path *path,
struct inode *inode,
bool may_block,
void *ctx),
void *ctx);
) with lookup done and if it ends up at something positive, act() called
for it. If we end up reaching the very end in RCU mode, act() gets called
with false as the third argument, _without_ dropping rcu_read_lock() or
grabbing references. It may return -ECHILD, in which case we'll unlazy
and call it again with may_block being true; if it does *not* return
-ECHILD, we'll check for mount lock and d_seq still being valid. If they
are, we are done, if not - restart the lookup from scratch in non-lazy mode.

Basically, that's your "could we get stat(2) without ever dirtying
anything shared?" thing, except that it's might be applicable to some of the
getxattr(), statfs(), access(), listxattr() and readlink() as well.
The obstacles for stat() are
* ->d_weak_revalidate() needs to be taught about being called in
RCU mode. Not a problem - flags are already passed and one of two instances
is already checking for LOOKUP_RCU (what with being ->d_revalidate() at the
same time). That one applies to all of them, not just stat().
* Linux S&M with its usual habit of sticking hooks into every
orifice out there (and if there hadn't been one, the hook still goes in,
of course). In this case it's not just selinux, as with follow_link -
apparmor, tomoyo and smack are also there. selinux one looks like it could
be made to work if given an inode and may_block in addition to struct path;
the rest... no idea.
* telling ->getattr() that we are in RCU mode. And giving it
inode, of course. As the first approximation, we could live with just
the "if ->getattr isn't NULL, chicken out and return -ECHILD", but e.g.
ext4, btrfs and xfs have non-NULL ->getattr(). In this case I wonder if
adding a new method wouldn't be the right thing...

Overall, it seems to be doable, and with the results of massage
already done to fs/namei.c the PITA promises to be fairly limited. How
generic do we really want it? I mean, is e.g. access(2) (faccessat(2))
worth bothering with? Or getxattr(2), for that matter... Comments?

Anyway, additional pieces of the series follow:

namei: unlazy_walk() doesn't need to mess with current->fs anymore
lustre: kill unused macro (LOOKUP_CONTINUE)
lustre: kill unused helper
get rid of assorted nameidata-related debris
[Neil's] Documentation: remove outdated information from automount-support.txt
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
namei: uninline set_root{,_rcu}()
namei: pass the struct path to store the result down into path_lookupat()
namei: move putname() call into filename_lookup()
namei: shift nameidata inside filename_lookup()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata down into filename_parentat()
namei: saner calling conventions for filename_create()
namei: saner calling conventions for filename_parentat()
namei: fold path_cleanup() into terminate_walk()
namei: stash dfd and name into nameidata
namei: trim do_last() arguments
inline user_path_parent()
inline user_path_create()
namei: move saved_nd pointer into struct nameidata
turn user_{path_at,path,lpath,path_dir}() into static inlines

2015-05-13 22:31:03

by Al Viro

[permalink] [raw]
Subject: [PATCH 01/21] namei: unlazy_walk() doesn't need to mess with current->fs anymore

From: Al Viro <[email protected]>

now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq)
will do just fine...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3ee1d65..e736722 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -651,7 +651,6 @@ static bool legitimize_links(struct nameidata *nd)
*/
static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq)
{
- struct fs_struct *fs = current->fs;
struct dentry *parent = nd->path.dentry;

BUG_ON(!(nd->flags & LOOKUP_RCU));
@@ -691,13 +690,11 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry, unsigned seq
* still valid and get it if required.
*/
if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) {
- spin_lock(&fs->lock);
- if (unlikely(!path_equal(&nd->root, &fs->root))) {
- spin_unlock(&fs->lock);
- goto drop_dentry;
+ if (unlikely(!legitimize_path(nd, &nd->root, nd->root_seq))) {
+ rcu_read_unlock();
+ dput(dentry);
+ return -ECHILD;
}
- path_get(&nd->root);
- spin_unlock(&fs->lock);
}

rcu_read_unlock();
--
2.1.4

2015-05-13 22:30:57

by Al Viro

[permalink] [raw]
Subject: [PATCH 02/21] lustre: kill unused macro (LOOKUP_CONTINUE)

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
drivers/staging/lustre/lustre/llite/llite_internal.h | 6 ------
1 file changed, 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 5f918e3..528af90 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -57,12 +57,6 @@
#define VM_FAULT_RETRY 0
#endif

-/* Kernel 3.1 kills LOOKUP_CONTINUE, LOOKUP_PARENT is equivalent to it.
- * seem kernel commit 49084c3bb2055c401f3493c13edae14d49128ca0 */
-#ifndef LOOKUP_CONTINUE
-#define LOOKUP_CONTINUE LOOKUP_PARENT
-#endif
-
/** Only used on client-side for indicating the tail of dir hash/offset. */
#define LL_DIR_END_OFF 0x7fffffffffffffffULL
#define LL_DIR_END_OFF_32BIT 0x7fffffffUL
--
2.1.4

2015-05-13 22:26:20

by Al Viro

[permalink] [raw]
Subject: [PATCH 03/21] lustre: kill unused helper

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
.../staging/lustre/lustre/include/linux/lustre_compat25.h | 15 ---------------
1 file changed, 15 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h b/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h
index 3925db1..513c81f 100644
--- a/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h
+++ b/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h
@@ -189,22 +189,7 @@ static inline int ll_quota_off(struct super_block *sb, int off, int remount)
#endif


-
-/*
- * After 3.1, kernel's nameidata.intent.open.flags is different
- * with lustre's lookup_intent.it_flags, as lustre's it_flags'
- * lower bits equal to FMODE_xxx while kernel doesn't transliterate
- * lower bits of nameidata.intent.open.flags to FMODE_xxx.
- * */
#include <linux/version.h>
-static inline int ll_namei_to_lookup_intent_flag(int flag)
-{
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 1, 0)
- flag = (flag & ~O_ACCMODE) | OPEN_FMODE(flag);
-#endif
- return flag;
-}
-
#include <linux/fs.h>

# define ll_umode_t umode_t
--
2.1.4

2015-05-13 22:26:24

by Al Viro

[permalink] [raw]
Subject: [PATCH 04/21] get rid of assorted nameidata-related debris

From: Al Viro <[email protected]>

pointless forward declarations, stale comments

Signed-off-by: Al Viro <[email protected]>
---
fs/9p/vfs_inode.c | 3 +--
fs/9p/vfs_inode_dotl.c | 3 +--
fs/ecryptfs/inode.c | 3 +--
fs/ntfs/namei.c | 2 +-
include/linux/fs.h | 1 -
include/linux/namei.h | 1 -
include/linux/sched.h | 1 +
7 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 271f51a..510040b 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1226,8 +1226,7 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
/**
* v9fs_vfs_follow_link - follow a symlink path
* @dentry: dentry for symlink
- * @nd: nameidata
- *
+ * @cookie: place to pass the data to put_link()
*/

static const char *v9fs_vfs_follow_link(struct dentry *dentry, void **cookie)
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 16658ed..09e4433 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -905,8 +905,7 @@ error:
/**
* v9fs_vfs_follow_link_dotl - follow a symlink path
* @dentry: dentry for symlink
- * @nd: nameidata
- *
+ * @cookie: place to pass the data to put_link()
*/

static const char *
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 73d20ae..3c4db11 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -170,7 +170,6 @@ out_unlock:
* @directory_inode: inode of the new file's dentry's parent in ecryptfs
* @ecryptfs_dentry: New file's dentry in ecryptfs
* @mode: The mode of the new file
- * @nd: nameidata of ecryptfs' parent's dentry & vfsmount
*
* Creates the underlying file and the eCryptfs inode which will link to
* it. It will also update the eCryptfs directory inode to mimic the
@@ -384,7 +383,7 @@ static int ecryptfs_lookup_interpose(struct dentry *dentry,
* ecryptfs_lookup
* @ecryptfs_dir_inode: The eCryptfs directory inode
* @ecryptfs_dentry: The eCryptfs dentry that we are looking up
- * @ecryptfs_nd: nameidata; may be NULL
+ * @flags: lookup flags
*
* Find a file on disk. If the file does not exist, then we'll add it to the
* dentry cache and continue on to read it from the disk.
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index 0f35b80..443abec 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -35,7 +35,7 @@
* ntfs_lookup - find the inode represented by a dentry in a directory inode
* @dir_ino: directory inode in which to look for the inode
* @dent: dentry representing the inode to look for
- * @nd: lookup nameidata
+ * @flags: lookup flags
*
* In short, ntfs_lookup() looks for the inode represented by the dentry @dent
* in the directory inode @dir_ino and if found attaches the inode to the
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1426c43..b577e80 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -38,7 +38,6 @@ struct backing_dev_info;
struct export_operations;
struct hd_geometry;
struct iovec;
-struct nameidata;
struct kiocb;
struct kobject;
struct pipe_inode_info;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index d756304..1208e48 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -7,7 +7,6 @@
#include <linux/path.h>

struct vfsmount;
-struct nameidata;

enum { MAX_NESTED_LINKS = 8 };

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f6c9b69..a1158c9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -132,6 +132,7 @@ struct fs_struct;
struct perf_event_context;
struct blk_plug;
struct filename;
+struct nameidata;

#define VMACACHE_BITS 2
#define VMACACHE_SIZE (1U << VMACACHE_BITS)
--
2.1.4

2015-05-13 22:28:55

by Al Viro

[permalink] [raw]
Subject: [PATCH 05/21] Documentation: remove outdated information from automount-support.txt

From: NeilBrown <[email protected]>

The guidelines for adding automount support to a filesystem
in filesystems/automount-support.txt is out or date.
filesystems/autofs4.txt contains more current text, so replace
the out-of-date content with a reference to that.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
Documentation/filesystems/automount-support.txt | 51 +++++++------------------
1 file changed, 13 insertions(+), 38 deletions(-)

diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 7cac200..7eb762e 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -1,41 +1,15 @@
-Support is available for filesystems that wish to do automounting support (such
-as kAFS which can be found in fs/afs/). This facility includes allowing
-in-kernel mounts to be performed and mountpoint degradation to be
-requested. The latter can also be requested by userspace.
+Support is available for filesystems that wish to do automounting
+support (such as kAFS which can be found in fs/afs/ and NFS in
+fs/nfs/). This facility includes allowing in-kernel mounts to be
+performed and mountpoint degradation to be requested. The latter can
+also be requested by userspace.


======================
IN-KERNEL AUTOMOUNTING
======================

-A filesystem can now mount another filesystem on one of its directories by the
-following procedure:
-
- (1) Give the directory a follow_link() operation.
-
- When the directory is accessed, the follow_link op will be called, and
- it will be provided with the location of the mountpoint in the nameidata
- structure (vfsmount and dentry).
-
- (2) Have the follow_link() op do the following steps:
-
- (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
- superblock and gain a vfsmount structure representing it.
-
- (b) Copy the nameidata provided as an argument and substitute the dentry
- argument into it the copy.
-
- (c) Call do_add_mount() to install the new vfsmount into the namespace's
- mountpoint tree, thus making it accessible to userspace. Use the
- nameidata set up in (b) as the destination.
-
- If the mountpoint will be automatically expired, then do_add_mount()
- should also be given the location of an expiration list (see further
- down).
-
- (d) Release the path in the nameidata argument and substitute in the new
- vfsmount and its root dentry. The ref counts on these will need
- incrementing.
+See section "Mount Traps" of Documentation/filesystems/autofs4.txt

Then from userspace, you can just do something like:

@@ -61,17 +35,18 @@ AUTOMATIC MOUNTPOINT EXPIRY
===========================

Automatic expiration of mountpoints is easy, provided you've mounted the
-mountpoint to be expired in the automounting procedure outlined above.
+mountpoint to be expired in the automounting procedure outlined separately.

To do expiration, you need to follow these steps:

- (3) Create at least one list off which the vfsmounts to be expired can be
- hung. Access to this list will be governed by the vfsmount_lock.
+ (1) Create at least one list off which the vfsmounts to be expired can be
+ hung.

- (4) In step (2c) above, the call to do_add_mount() should be provided with a
- pointer to this list. It will hang the vfsmount off of it if it succeeds.
+ (2) When a new mountpoint is created in the ->d_automount method, add
+ the mnt to the list using mnt_set_expiry()
+ mnt_set_expiry(newmnt, &afs_vfsmounts);

- (5) When you want mountpoints to be expired, call mark_mounts_for_expiry()
+ (3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
with a pointer to this list. This will process the list, marking every
vfsmount thereon for potential expiry on the next call.

--
2.1.4

2015-05-13 22:28:23

by Al Viro

[permalink] [raw]
Subject: [PATCH 06/21] namei: be careful with mountpoint crossings in follow_dotdot_rcu()

From: Al Viro <[email protected]>

Otherwise we are risking a hard error where nonlazy restart would be the right
thing to do; it's a very narrow race with mount --move and most of the time it
ends up being completely harmless, but it's possible to construct a case when
we'll get a bogus hard error instead of falling back to non-lazy walk...

For one thing, when crossing _into_ overmount of parent we need to check for
mount_lock bumps when we get NULL from __lookup_mnt() as well.

For another, and less exotically, we need to make sure that the data fetched
in follow_up_rcu() had been consistent. ->mnt_mountpoint is pinned for as
long as it is a mountpoint, but we need to check mount_lock after fetching
to verify that.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 51 +++++++++++++++++++++------------------------------
1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e736722..e907e4b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1017,21 +1017,6 @@ const char *get_link(struct nameidata *nd)
return res;
}

-static int follow_up_rcu(struct path *path)
-{
- struct mount *mnt = real_mount(path->mnt);
- struct mount *parent;
- struct dentry *mountpoint;
-
- parent = mnt->mnt_parent;
- if (&parent->mnt == path->mnt)
- return 0;
- mountpoint = mnt->mnt_mountpoint;
- path->dentry = mountpoint;
- path->mnt = &parent->mnt;
- return 1;
-}
-
/*
* follow_up - Find the mountpoint of path's vfsmount
*
@@ -1288,10 +1273,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
set_root_rcu(nd);

while (1) {
- if (nd->path.dentry == nd->root.dentry &&
- nd->path.mnt == nd->root.mnt) {
+ if (path_equal(&nd->path, &nd->root))
break;
- }
if (nd->path.dentry != nd->path.mnt->mnt_root) {
struct dentry *old = nd->path.dentry;
struct dentry *parent = old->d_parent;
@@ -1299,34 +1282,42 @@ static int follow_dotdot_rcu(struct nameidata *nd)

inode = parent->d_inode;
seq = read_seqcount_begin(&parent->d_seq);
- if (read_seqcount_retry(&old->d_seq, nd->seq))
- goto failed;
+ if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
+ return -ECHILD;
nd->path.dentry = parent;
nd->seq = seq;
break;
+ } else {
+ struct mount *mnt = real_mount(nd->path.mnt);
+ struct mount *mparent = mnt->mnt_parent;
+ struct dentry *mountpoint = mnt->mnt_mountpoint;
+ struct inode *inode2 = mountpoint->d_inode;
+ unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
+ if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+ return -ECHILD;
+ if (&mparent->mnt == nd->path.mnt)
+ break;
+ /* we know that mountpoint was pinned */
+ nd->path.dentry = mountpoint;
+ nd->path.mnt = &mparent->mnt;
+ inode = inode2;
+ nd->seq = seq;
}
- if (!follow_up_rcu(&nd->path))
- break;
- inode = nd->path.dentry->d_inode;
- nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
}
- while (d_mountpoint(nd->path.dentry)) {
+ while (unlikely(d_mountpoint(nd->path.dentry))) {
struct mount *mounted;
mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
+ if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+ return -ECHILD;
if (!mounted)
break;
nd->path.mnt = &mounted->mnt;
nd->path.dentry = mounted->mnt.mnt_root;
inode = nd->path.dentry->d_inode;
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
- if (read_seqretry(&mount_lock, nd->m_seq))
- goto failed;
}
nd->inode = inode;
return 0;
-
-failed:
- return -ECHILD;
}

/*
--
2.1.4

2015-05-13 22:29:44

by Al Viro

[permalink] [raw]
Subject: [PATCH 07/21] namei: uninline set_root{,_rcu}()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e907e4b..ffa2e13 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -776,12 +776,12 @@ static int complete_walk(struct nameidata *nd)
return status;
}

-static __always_inline void set_root(struct nameidata *nd)
+static void set_root(struct nameidata *nd)
{
get_fs_root(current->fs, &nd->root);
}

-static __always_inline unsigned set_root_rcu(struct nameidata *nd)
+static unsigned set_root_rcu(struct nameidata *nd)
{
struct fs_struct *fs = current->fs;
unsigned seq;
--
2.1.4

2015-05-13 22:31:41

by Al Viro

[permalink] [raw]
Subject: [PATCH 08/21] namei: pass the struct path to store the result down into path_lookupat()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 72 +++++++++++++++++++++++++++++---------------------------------
1 file changed, 34 insertions(+), 38 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ffa2e13..937ec33 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2084,8 +2084,8 @@ static inline int lookup_last(struct nameidata *nd)
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
-static int path_lookupat(int dfd, const struct filename *name,
- unsigned int flags, struct nameidata *nd)
+static int path_lookupat(int dfd, const struct filename *name, unsigned flags,
+ struct nameidata *nd, struct path *path)
{
const char *s = path_init(dfd, name, flags, nd);
int err;
@@ -2106,27 +2106,31 @@ static int path_lookupat(int dfd, const struct filename *name,
if (!err && nd->flags & LOOKUP_DIRECTORY)
if (!d_can_lookup(nd->path.dentry))
err = -ENOTDIR;
- if (err)
- terminate_walk(nd);
-
+ if (!err) {
+ *path = nd->path;
+ nd->path.mnt = NULL;
+ nd->path.dentry = NULL;
+ }
+ terminate_walk(nd);
path_cleanup(nd);
return err;
}

-static int filename_lookup(int dfd, struct filename *name,
- unsigned int flags, struct nameidata *nd)
+static int filename_lookup(int dfd, struct filename *name, unsigned flags,
+ struct nameidata *nd, struct path *path)
{
int retval;
struct nameidata *saved_nd = set_nameidata(nd);

- retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
+ retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd, path);
if (unlikely(retval == -ECHILD))
- retval = path_lookupat(dfd, name, flags, nd);
+ retval = path_lookupat(dfd, name, flags, nd, path);
if (unlikely(retval == -ESTALE))
- retval = path_lookupat(dfd, name, flags | LOOKUP_REVAL, nd);
+ retval = path_lookupat(dfd, name, flags | LOOKUP_REVAL,
+ nd, path);

if (likely(!retval))
- audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
+ audit_inode(name, path->dentry, flags & LOOKUP_PARENT);
restore_nameidata(saved_nd);
return retval;
}
@@ -2207,10 +2211,8 @@ int kern_path(const char *name, unsigned int flags, struct path *path)
int res = PTR_ERR(filename);

if (!IS_ERR(filename)) {
- res = filename_lookup(AT_FDCWD, filename, flags, &nd);
+ res = filename_lookup(AT_FDCWD, filename, flags, &nd, path);
putname(filename);
- if (!res)
- *path = nd.path;
}
return res;
}
@@ -2239,9 +2241,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
nd.root.dentry = dentry;
nd.root.mnt = mnt;
err = filename_lookup(AT_FDCWD, filename,
- flags | LOOKUP_ROOT, &nd);
- if (!err)
- *path = nd.path;
+ flags | LOOKUP_ROOT, &nd, path);
putname(filename);
}
return err;
@@ -2309,10 +2309,8 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,

BUG_ON(flags & LOOKUP_PARENT);

- err = filename_lookup(dfd, tmp, flags, &nd);
+ err = filename_lookup(dfd, tmp, flags, &nd, path);
putname(tmp);
- if (!err)
- *path = nd.path;
}
return err;
}
@@ -3259,44 +3257,42 @@ static int do_tmpfile(int dfd, struct filename *pathname,
struct file *file, int *opened)
{
static const struct qstr name = QSTR_INIT("/", 1);
- struct dentry *dentry, *child;
+ struct dentry *child;
struct inode *dir;
+ struct path path;
int error = path_lookupat(dfd, pathname,
- flags | LOOKUP_DIRECTORY, nd);
+ flags | LOOKUP_DIRECTORY, nd, &path);
if (unlikely(error))
return error;
- error = mnt_want_write(nd->path.mnt);
+ error = mnt_want_write(path.mnt);
if (unlikely(error))
goto out;
+ dir = path.dentry->d_inode;
/* we want directory to be writable */
- error = inode_permission(nd->inode, MAY_WRITE | MAY_EXEC);
+ error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
if (error)
goto out2;
- dentry = nd->path.dentry;
- dir = dentry->d_inode;
if (!dir->i_op->tmpfile) {
error = -EOPNOTSUPP;
goto out2;
}
- child = d_alloc(dentry, &name);
+ child = d_alloc(path.dentry, &name);
if (unlikely(!child)) {
error = -ENOMEM;
goto out2;
}
- nd->flags &= ~LOOKUP_DIRECTORY;
- nd->flags |= op->intent;
- dput(nd->path.dentry);
- nd->path.dentry = child;
- error = dir->i_op->tmpfile(dir, nd->path.dentry, op->mode);
+ dput(path.dentry);
+ path.dentry = child;
+ error = dir->i_op->tmpfile(dir, child, op->mode);
if (error)
goto out2;
- audit_inode(pathname, nd->path.dentry, 0);
+ audit_inode(pathname, child, 0);
/* Don't check for other permissions, the inode was just created */
- error = may_open(&nd->path, MAY_OPEN, op->open_flag);
+ error = may_open(&path, MAY_OPEN, op->open_flag);
if (error)
goto out2;
- file->f_path.mnt = nd->path.mnt;
- error = finish_open(file, nd->path.dentry, NULL, opened);
+ file->f_path.mnt = path.mnt;
+ error = finish_open(file, child, NULL, opened);
if (error)
goto out2;
error = open_check_o_direct(file);
@@ -3309,9 +3305,9 @@ static int do_tmpfile(int dfd, struct filename *pathname,
spin_unlock(&inode->i_lock);
}
out2:
- mnt_drop_write(nd->path.mnt);
+ mnt_drop_write(path.mnt);
out:
- path_put(&nd->path);
+ path_put(&path);
return error;
}

--
2.1.4

2015-05-13 22:26:46

by Al Viro

[permalink] [raw]
Subject: [PATCH 09/21] namei: move putname() call into filename_lookup()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 38 +++++++++++++++-----------------------
1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 937ec33..17350c0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2132,6 +2132,7 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,
if (likely(!retval))
audit_inode(name, path->dentry, flags & LOOKUP_PARENT);
restore_nameidata(saved_nd);
+ putname(name);
return retval;
}

@@ -2208,13 +2209,9 @@ int kern_path(const char *name, unsigned int flags, struct path *path)
{
struct nameidata nd;
struct filename *filename = getname_kernel(name);
- int res = PTR_ERR(filename);
-
- if (!IS_ERR(filename)) {
- res = filename_lookup(AT_FDCWD, filename, flags, &nd, path);
- putname(filename);
- }
- return res;
+ if (IS_ERR(filename))
+ return PTR_ERR(filename);
+ return filename_lookup(AT_FDCWD, filename, flags, &nd, path);
}
EXPORT_SYMBOL(kern_path);

@@ -2230,21 +2227,19 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
const char *name, unsigned int flags,
struct path *path)
{
+ struct nameidata nd;
struct filename *filename = getname_kernel(name);
- int err = PTR_ERR(filename);

BUG_ON(flags & LOOKUP_PARENT);

+ if (IS_ERR(filename))
+ return PTR_ERR(filename);
+
+ nd.root.dentry = dentry;
+ nd.root.mnt = mnt;
/* the first argument of filename_lookup() is ignored with LOOKUP_ROOT */
- if (!IS_ERR(filename)) {
- struct nameidata nd;
- nd.root.dentry = dentry;
- nd.root.mnt = mnt;
- err = filename_lookup(AT_FDCWD, filename,
+ return filename_lookup(AT_FDCWD, filename,
flags | LOOKUP_ROOT, &nd, path);
- putname(filename);
- }
- return err;
}
EXPORT_SYMBOL(vfs_path_lookup);

@@ -2304,15 +2299,12 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
{
struct nameidata nd;
struct filename *tmp = getname_flags(name, flags, empty);
- int err = PTR_ERR(tmp);
- if (!IS_ERR(tmp)) {
+ if (IS_ERR(tmp))
+ return PTR_ERR(tmp);

- BUG_ON(flags & LOOKUP_PARENT);
+ BUG_ON(flags & LOOKUP_PARENT);

- err = filename_lookup(dfd, tmp, flags, &nd, path);
- putname(tmp);
- }
- return err;
+ return filename_lookup(dfd, tmp, flags, &nd, path);
}

int user_path_at(int dfd, const char __user *name, unsigned flags,
--
2.1.4

2015-05-13 22:29:40

by Al Viro

[permalink] [raw]
Subject: [PATCH 10/21] namei: shift nameidata inside filename_lookup()

From: Al Viro <[email protected]>

pass root instead; non-NULL => copy to nd.root and
set LOOKUP_ROOT in flags

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 17350c0..abf65b0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2117,17 +2117,20 @@ static int path_lookupat(int dfd, const struct filename *name, unsigned flags,
}

static int filename_lookup(int dfd, struct filename *name, unsigned flags,
- struct nameidata *nd, struct path *path)
+ struct path *path, struct path *root)
{
int retval;
- struct nameidata *saved_nd = set_nameidata(nd);
-
- retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd, path);
+ struct nameidata nd, *saved_nd = set_nameidata(&nd);
+ if (unlikely(root)) {
+ nd.root = *root;
+ flags |= LOOKUP_ROOT;
+ }
+ retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, &nd, path);
if (unlikely(retval == -ECHILD))
- retval = path_lookupat(dfd, name, flags, nd, path);
+ retval = path_lookupat(dfd, name, flags, &nd, path);
if (unlikely(retval == -ESTALE))
retval = path_lookupat(dfd, name, flags | LOOKUP_REVAL,
- nd, path);
+ &nd, path);

if (likely(!retval))
audit_inode(name, path->dentry, flags & LOOKUP_PARENT);
@@ -2207,11 +2210,10 @@ out:

int kern_path(const char *name, unsigned int flags, struct path *path)
{
- struct nameidata nd;
struct filename *filename = getname_kernel(name);
if (IS_ERR(filename))
return PTR_ERR(filename);
- return filename_lookup(AT_FDCWD, filename, flags, &nd, path);
+ return filename_lookup(AT_FDCWD, filename, flags, path, NULL);
}
EXPORT_SYMBOL(kern_path);

@@ -2227,7 +2229,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
const char *name, unsigned int flags,
struct path *path)
{
- struct nameidata nd;
+ struct path root = {.mnt = mnt, .dentry = dentry};
struct filename *filename = getname_kernel(name);

BUG_ON(flags & LOOKUP_PARENT);
@@ -2235,11 +2237,8 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
if (IS_ERR(filename))
return PTR_ERR(filename);

- nd.root.dentry = dentry;
- nd.root.mnt = mnt;
- /* the first argument of filename_lookup() is ignored with LOOKUP_ROOT */
- return filename_lookup(AT_FDCWD, filename,
- flags | LOOKUP_ROOT, &nd, path);
+ /* the first argument of filename_lookup() is ignored with root */
+ return filename_lookup(AT_FDCWD, filename, flags , path, &root);
}
EXPORT_SYMBOL(vfs_path_lookup);

@@ -2297,14 +2296,13 @@ EXPORT_SYMBOL(lookup_one_len);
int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
struct path *path, int *empty)
{
- struct nameidata nd;
struct filename *tmp = getname_flags(name, flags, empty);
if (IS_ERR(tmp))
return PTR_ERR(tmp);

BUG_ON(flags & LOOKUP_PARENT);

- return filename_lookup(dfd, tmp, flags, &nd, path);
+ return filename_lookup(dfd, tmp, flags, path, NULL);
}

int user_path_at(int dfd, const char __user *name, unsigned flags,
--
2.1.4

2015-05-13 22:31:43

by Al Viro

[permalink] [raw]
Subject: [PATCH 11/21] namei: make filename_lookup() reject ERR_PTR() passed as name

From: Al Viro <[email protected]>

makes for much easier life in callers

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 30 ++++++++++--------------------
1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index abf65b0..6607a05 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2120,7 +2120,10 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,
struct path *path, struct path *root)
{
int retval;
- struct nameidata nd, *saved_nd = set_nameidata(&nd);
+ struct nameidata nd, *saved_nd;
+ if (IS_ERR(name))
+ return PTR_ERR(name);
+ saved_nd = set_nameidata(&nd);
if (unlikely(root)) {
nd.root = *root;
flags |= LOOKUP_ROOT;
@@ -2210,10 +2213,8 @@ out:

int kern_path(const char *name, unsigned int flags, struct path *path)
{
- struct filename *filename = getname_kernel(name);
- if (IS_ERR(filename))
- return PTR_ERR(filename);
- return filename_lookup(AT_FDCWD, filename, flags, path, NULL);
+ return filename_lookup(AT_FDCWD, getname_kernel(name),
+ flags, path, NULL);
}
EXPORT_SYMBOL(kern_path);

@@ -2230,15 +2231,9 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
struct path *path)
{
struct path root = {.mnt = mnt, .dentry = dentry};
- struct filename *filename = getname_kernel(name);
-
- BUG_ON(flags & LOOKUP_PARENT);
-
- if (IS_ERR(filename))
- return PTR_ERR(filename);
-
/* the first argument of filename_lookup() is ignored with root */
- return filename_lookup(AT_FDCWD, filename, flags , path, &root);
+ return filename_lookup(AT_FDCWD, getname_kernel(name),
+ flags , path, &root);
}
EXPORT_SYMBOL(vfs_path_lookup);

@@ -2296,13 +2291,8 @@ EXPORT_SYMBOL(lookup_one_len);
int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
struct path *path, int *empty)
{
- struct filename *tmp = getname_flags(name, flags, empty);
- if (IS_ERR(tmp))
- return PTR_ERR(tmp);
-
- BUG_ON(flags & LOOKUP_PARENT);
-
- return filename_lookup(dfd, tmp, flags, path, NULL);
+ return filename_lookup(dfd, getname_flags(name, flags, empty),
+ flags, path, NULL);
}

int user_path_at(int dfd, const char __user *name, unsigned flags,
--
2.1.4

2015-05-13 22:28:21

by Al Viro

[permalink] [raw]
Subject: [PATCH 12/21] namei: shift nameidata down into filename_parentat()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 84 ++++++++++++++++++++++++++++++++------------------------------
1 file changed, 43 insertions(+), 41 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6607a05..dbd1a34 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2144,7 +2144,8 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
static int path_parentat(int dfd, const struct filename *name,
- unsigned int flags, struct nameidata *nd)
+ unsigned int flags, struct nameidata *nd,
+ struct path *parent)
{
const char *s = path_init(dfd, name, flags, nd);
int err;
@@ -2153,26 +2154,34 @@ static int path_parentat(int dfd, const struct filename *name,
err = link_path_walk(s, nd);
if (!err)
err = complete_walk(nd);
- if (err)
- terminate_walk(nd);
+ if (!err) {
+ *parent = nd->path;
+ nd->path.mnt = NULL;
+ nd->path.dentry = NULL;
+ }
+ terminate_walk(nd);
path_cleanup(nd);
return err;
}

static int filename_parentat(int dfd, struct filename *name,
- unsigned int flags, struct nameidata *nd)
+ unsigned int flags, struct path *parent,
+ struct qstr *last, int *type)
{
int retval;
- struct nameidata *saved_nd = set_nameidata(nd);
+ struct nameidata nd, *saved_nd = set_nameidata(&nd);

- retval = path_parentat(dfd, name, flags | LOOKUP_RCU, nd);
+ retval = path_parentat(dfd, name, flags | LOOKUP_RCU, &nd, parent);
if (unlikely(retval == -ECHILD))
- retval = path_parentat(dfd, name, flags, nd);
+ retval = path_parentat(dfd, name, flags, &nd, parent);
if (unlikely(retval == -ESTALE))
- retval = path_parentat(dfd, name, flags | LOOKUP_REVAL, nd);
-
- if (likely(!retval))
- audit_inode(name, nd->path.dentry, LOOKUP_PARENT);
+ retval = path_parentat(dfd, name, flags | LOOKUP_REVAL,
+ &nd, parent);
+ if (likely(!retval)) {
+ *last = nd.last;
+ *type = nd.last_type;
+ audit_inode(name, parent->dentry, LOOKUP_PARENT);
+ }
restore_nameidata(saved_nd);
return retval;
}
@@ -2181,31 +2190,30 @@ static int filename_parentat(int dfd, struct filename *name,
struct dentry *kern_path_locked(const char *name, struct path *path)
{
struct filename *filename = getname_kernel(name);
- struct nameidata nd;
+ struct qstr last;
+ int type;
struct dentry *d;
int err;

if (IS_ERR(filename))
return ERR_CAST(filename);

- err = filename_parentat(AT_FDCWD, filename, 0, &nd);
+ err = filename_parentat(AT_FDCWD, filename, 0, path, &last, &type);
if (err) {
d = ERR_PTR(err);
goto out;
}
- if (nd.last_type != LAST_NORM) {
- path_put(&nd.path);
+ if (type != LAST_NORM) {
+ path_put(path);
d = ERR_PTR(-EINVAL);
goto out;
}
- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- d = __lookup_hash(&nd.last, nd.path.dentry, 0);
+ mutex_lock_nested(&path->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+ d = __lookup_hash(&last, path->dentry, 0);
if (IS_ERR(d)) {
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
- path_put(&nd.path);
- goto out;
+ mutex_unlock(&path->dentry->d_inode->i_mutex);
+ path_put(path);
}
- *path = nd.path;
out:
putname(filename);
return d;
@@ -2315,7 +2323,6 @@ user_path_parent(int dfd, const char __user *path,
int *type,
unsigned int flags)
{
- struct nameidata nd;
struct filename *s = getname(path);
int error;

@@ -2325,15 +2332,11 @@ user_path_parent(int dfd, const char __user *path,
if (IS_ERR(s))
return s;

- error = filename_parentat(dfd, s, flags, &nd);
+ error = filename_parentat(dfd, s, flags, parent, last, type);
if (error) {
putname(s);
- return ERR_PTR(error);
+ s = ERR_PTR(error);
}
- *parent = nd.path;
- *last = nd.last;
- *type = nd.last_type;
-
return s;
}

@@ -3392,7 +3395,8 @@ static struct dentry *filename_create(int dfd, struct filename *name,
struct path *path, unsigned int lookup_flags)
{
struct dentry *dentry = ERR_PTR(-EEXIST);
- struct nameidata nd;
+ struct qstr last;
+ int type;
int err2;
int error;
bool is_dir = (lookup_flags & LOOKUP_DIRECTORY);
@@ -3403,7 +3407,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
*/
lookup_flags &= LOOKUP_REVAL;

- error = filename_parentat(dfd, name, lookup_flags, &nd);
+ error = filename_parentat(dfd, name, lookup_flags, path, &last, &type);
if (error)
return ERR_PTR(error);

@@ -3411,18 +3415,17 @@ static struct dentry *filename_create(int dfd, struct filename *name,
* Yucky last component or no last component at all?
* (foo/., foo/.., /////)
*/
- if (nd.last_type != LAST_NORM)
+ if (type != LAST_NORM)
goto out;
- nd.flags &= ~LOOKUP_PARENT;
- nd.flags |= LOOKUP_CREATE | LOOKUP_EXCL;

/* don't fail immediately if it's r/o, at least try to report other errors */
- err2 = mnt_want_write(nd.path.mnt);
+ err2 = mnt_want_write(path->mnt);
/*
* Do the final lookup.
*/
- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = __lookup_hash(&nd.last, nd.path.dentry, nd.flags);
+ lookup_flags |= LOOKUP_CREATE | LOOKUP_EXCL;
+ mutex_lock_nested(&path->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
+ dentry = __lookup_hash(&last, path->dentry, lookup_flags);
if (IS_ERR(dentry))
goto unlock;

@@ -3436,7 +3439,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
* all is fine. Let's be bastards - you had / on the end, you've
* been asking for (non-existent) directory. -ENOENT for you.
*/
- if (unlikely(!is_dir && nd.last.name[nd.last.len])) {
+ if (unlikely(!is_dir && last.name[last.len])) {
error = -ENOENT;
goto fail;
}
@@ -3444,17 +3447,16 @@ static struct dentry *filename_create(int dfd, struct filename *name,
error = err2;
goto fail;
}
- *path = nd.path;
return dentry;
fail:
dput(dentry);
dentry = ERR_PTR(error);
unlock:
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
+ mutex_unlock(&path->dentry->d_inode->i_mutex);
if (!err2)
- mnt_drop_write(nd.path.mnt);
+ mnt_drop_write(path->mnt);
out:
- path_put(&nd.path);
+ path_put(path);
return dentry;
}

--
2.1.4

2015-05-13 22:26:51

by Al Viro

[permalink] [raw]
Subject: [PATCH 13/21] namei: saner calling conventions for filename_create()

From: Al Viro <[email protected]>

a) make it reject ERR_PTR() for name
b) make it putname(name) upon return in all other cases.

seriously simplifies the callers...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dbd1a34..4588e82 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3401,6 +3401,8 @@ static struct dentry *filename_create(int dfd, struct filename *name,
int error;
bool is_dir = (lookup_flags & LOOKUP_DIRECTORY);

+ if (IS_ERR(name))
+ return ERR_CAST(name);
/*
* Note that only LOOKUP_REVAL and LOOKUP_DIRECTORY matter here. Any
* other flags passed in are ignored!
@@ -3408,8 +3410,10 @@ static struct dentry *filename_create(int dfd, struct filename *name,
lookup_flags &= LOOKUP_REVAL;

error = filename_parentat(dfd, name, lookup_flags, path, &last, &type);
- if (error)
+ if (error) {
+ putname(name);
return ERR_PTR(error);
+ }

/*
* Yucky last component or no last component at all?
@@ -3447,6 +3451,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
error = err2;
goto fail;
}
+ putname(name);
return dentry;
fail:
dput(dentry);
@@ -3457,20 +3462,15 @@ unlock:
mnt_drop_write(path->mnt);
out:
path_put(path);
+ putname(name);
return dentry;
}

struct dentry *kern_path_create(int dfd, const char *pathname,
struct path *path, unsigned int lookup_flags)
{
- struct filename *filename = getname_kernel(pathname);
- struct dentry *res;
-
- if (IS_ERR(filename))
- return ERR_CAST(filename);
- res = filename_create(dfd, filename, path, lookup_flags);
- putname(filename);
- return res;
+ return filename_create(dfd, getname_kernel(pathname),
+ path, lookup_flags);
}
EXPORT_SYMBOL(kern_path_create);

@@ -3486,13 +3486,7 @@ EXPORT_SYMBOL(done_path_create);
struct dentry *user_path_create(int dfd, const char __user *pathname,
struct path *path, unsigned int lookup_flags)
{
- struct filename *tmp = getname(pathname);
- struct dentry *res;
- if (IS_ERR(tmp))
- return ERR_CAST(tmp);
- res = filename_create(dfd, tmp, path, lookup_flags);
- putname(tmp);
- return res;
+ return filename_create(dfd, getname(pathname), path, lookup_flags);
}
EXPORT_SYMBOL(user_path_create);

--
2.1.4

2015-05-13 22:26:55

by Al Viro

[permalink] [raw]
Subject: [PATCH 14/21] namei: saner calling conventions for filename_parentat()

From: Al Viro <[email protected]>

a) make it reject ERR_PTR() for name
b) make it putname(name) on all other failure exits
c) make it return name on success

again, simplifies the callers

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 60 ++++++++++++++++++++++--------------------------------------
1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4588e82..bd45300 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2164,13 +2164,16 @@ static int path_parentat(int dfd, const struct filename *name,
return err;
}

-static int filename_parentat(int dfd, struct filename *name,
+static struct filename *filename_parentat(int dfd, struct filename *name,
unsigned int flags, struct path *parent,
struct qstr *last, int *type)
{
int retval;
- struct nameidata nd, *saved_nd = set_nameidata(&nd);
+ struct nameidata nd, *saved_nd;

+ if (IS_ERR(name))
+ return name;
+ saved_nd = set_nameidata(&nd);
retval = path_parentat(dfd, name, flags | LOOKUP_RCU, &nd, parent);
if (unlikely(retval == -ECHILD))
retval = path_parentat(dfd, name, flags, &nd, parent);
@@ -2181,32 +2184,30 @@ static int filename_parentat(int dfd, struct filename *name,
*last = nd.last;
*type = nd.last_type;
audit_inode(name, parent->dentry, LOOKUP_PARENT);
+ } else {
+ putname(name);
+ name = ERR_PTR(retval);
}
restore_nameidata(saved_nd);
- return retval;
+ return name;
}

/* does lookup, returns the object with parent locked */
struct dentry *kern_path_locked(const char *name, struct path *path)
{
- struct filename *filename = getname_kernel(name);
+ struct filename *filename;
+ struct dentry *d;
struct qstr last;
int type;
- struct dentry *d;
- int err;

+ filename = filename_parentat(AT_FDCWD, getname_kernel(name), 0, path,
+ &last, &type);
if (IS_ERR(filename))
return ERR_CAST(filename);
-
- err = filename_parentat(AT_FDCWD, filename, 0, path, &last, &type);
- if (err) {
- d = ERR_PTR(err);
- goto out;
- }
- if (type != LAST_NORM) {
+ if (unlikely(type != LAST_NORM)) {
path_put(path);
- d = ERR_PTR(-EINVAL);
- goto out;
+ putname(filename);
+ return ERR_PTR(-EINVAL);
}
mutex_lock_nested(&path->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
d = __lookup_hash(&last, path->dentry, 0);
@@ -2214,7 +2215,6 @@ struct dentry *kern_path_locked(const char *name, struct path *path)
mutex_unlock(&path->dentry->d_inode->i_mutex);
path_put(path);
}
-out:
putname(filename);
return d;
}
@@ -2323,21 +2323,9 @@ user_path_parent(int dfd, const char __user *path,
int *type,
unsigned int flags)
{
- struct filename *s = getname(path);
- int error;
-
/* only LOOKUP_REVAL is allowed in extra flags */
- flags &= LOOKUP_REVAL;
-
- if (IS_ERR(s))
- return s;
-
- error = filename_parentat(dfd, s, flags, parent, last, type);
- if (error) {
- putname(s);
- s = ERR_PTR(error);
- }
- return s;
+ return filename_parentat(dfd, getname(path), flags & LOOKUP_REVAL,
+ parent, last, type);
}

/**
@@ -3401,25 +3389,21 @@ static struct dentry *filename_create(int dfd, struct filename *name,
int error;
bool is_dir = (lookup_flags & LOOKUP_DIRECTORY);

- if (IS_ERR(name))
- return ERR_CAST(name);
/*
* Note that only LOOKUP_REVAL and LOOKUP_DIRECTORY matter here. Any
* other flags passed in are ignored!
*/
lookup_flags &= LOOKUP_REVAL;

- error = filename_parentat(dfd, name, lookup_flags, path, &last, &type);
- if (error) {
- putname(name);
- return ERR_PTR(error);
- }
+ name = filename_parentat(dfd, name, lookup_flags, path, &last, &type);
+ if (IS_ERR(name))
+ return ERR_CAST(name);

/*
* Yucky last component or no last component at all?
* (foo/., foo/.., /////)
*/
- if (type != LAST_NORM)
+ if (unlikely(type != LAST_NORM))
goto out;

/* don't fail immediately if it's r/o, at least try to report other errors */
--
2.1.4

2015-05-13 22:29:42

by Al Viro

[permalink] [raw]
Subject: [PATCH 15/21] namei: fold path_cleanup() into terminate_walk()

From: Al Viro <[email protected]>

they are always called next to each other; moreover,
terminate_walk() is more symmetrical that way.

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bd45300..59a4662 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -584,6 +584,10 @@ static void terminate_walk(struct nameidata *nd)
path_put(&nd->path);
for (i = 0; i < nd->depth; i++)
path_put(&nd->stack[i].link);
+ if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) {
+ path_put(&nd->root);
+ nd->root.mnt = NULL;
+ }
} else {
nd->flags &= ~LOOKUP_RCU;
if (!(nd->flags & LOOKUP_ROOT))
@@ -2049,14 +2053,6 @@ static const char *path_init(int dfd, const struct filename *name,
return ERR_PTR(-ECHILD);
}

-static void path_cleanup(struct nameidata *nd)
-{
- if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) {
- path_put(&nd->root);
- nd->root.mnt = NULL;
- }
-}
-
static const char *trailing_symlink(struct nameidata *nd)
{
const char *s;
@@ -2112,7 +2108,6 @@ static int path_lookupat(int dfd, const struct filename *name, unsigned flags,
nd->path.dentry = NULL;
}
terminate_walk(nd);
- path_cleanup(nd);
return err;
}

@@ -2160,7 +2155,6 @@ static int path_parentat(int dfd, const struct filename *name,
nd->path.dentry = NULL;
}
terminate_walk(nd);
- path_cleanup(nd);
return err;
}

@@ -2444,7 +2438,6 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
}
}
terminate_walk(nd);
- path_cleanup(nd);
return err;
}

@@ -3316,7 +3309,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
}
}
terminate_walk(nd);
- path_cleanup(nd);
out2:
if (!(opened & FILE_OPENED)) {
BUG_ON(!error);
--
2.1.4

2015-05-13 22:26:52

by Al Viro

[permalink] [raw]
Subject: [PATCH 16/21] namei: stash dfd and name into nameidata

From: Al Viro <[email protected]>

fewer arguments to pass around...

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 96 ++++++++++++++++++++++++++++++--------------------------------
1 file changed, 46 insertions(+), 50 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 59a4662..0c39113 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -498,6 +498,8 @@ struct nameidata {
struct qstr last;
struct path root;
struct inode *inode; /* path.dentry.d_inode */
+ struct filename *name;
+ int dfd;
unsigned int flags;
unsigned seq, m_seq, root_seq;
int last_type;
@@ -512,10 +514,13 @@ struct nameidata {
} *stack, internal[EMBEDDED_LEVELS];
};

-static struct nameidata *set_nameidata(struct nameidata *p)
+static struct nameidata *set_nameidata(struct nameidata *p, int dfd,
+ struct filename *name)
{
struct nameidata *old = current->nameidata;
p->stack = p->internal;
+ p->dfd = dfd;
+ p->name = name;
p->total_link_count = old ? old->total_link_count : 0;
current->nameidata = p;
return old;
@@ -1953,11 +1958,10 @@ OK:
}
}

-static const char *path_init(int dfd, const struct filename *name,
- unsigned int flags, struct nameidata *nd)
+static const char *path_init(struct nameidata *nd, unsigned flags)
{
int retval = 0;
- const char *s = name->name;
+ const char *s = nd->name->name;

nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
@@ -1997,7 +2001,7 @@ static const char *path_init(int dfd, const struct filename *name,
path_get(&nd->root);
}
nd->path = nd->root;
- } else if (dfd == AT_FDCWD) {
+ } else if (nd->dfd == AT_FDCWD) {
if (flags & LOOKUP_RCU) {
struct fs_struct *fs = current->fs;
unsigned seq;
@@ -2014,7 +2018,7 @@ static const char *path_init(int dfd, const struct filename *name,
}
} else {
/* Caller must check execute permissions on the starting path component */
- struct fd f = fdget_raw(dfd);
+ struct fd f = fdget_raw(nd->dfd);
struct dentry *dentry;

if (!f.file)
@@ -2080,10 +2084,9 @@ static inline int lookup_last(struct nameidata *nd)
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
-static int path_lookupat(int dfd, const struct filename *name, unsigned flags,
- struct nameidata *nd, struct path *path)
+static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path)
{
- const char *s = path_init(dfd, name, flags, nd);
+ const char *s = path_init(nd, flags);
int err;

if (IS_ERR(s))
@@ -2118,17 +2121,16 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,
struct nameidata nd, *saved_nd;
if (IS_ERR(name))
return PTR_ERR(name);
- saved_nd = set_nameidata(&nd);
+ saved_nd = set_nameidata(&nd, dfd, name);
if (unlikely(root)) {
nd.root = *root;
flags |= LOOKUP_ROOT;
}
- retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, &nd, path);
+ retval = path_lookupat(&nd, flags | LOOKUP_RCU, path);
if (unlikely(retval == -ECHILD))
- retval = path_lookupat(dfd, name, flags, &nd, path);
+ retval = path_lookupat(&nd, flags, path);
if (unlikely(retval == -ESTALE))
- retval = path_lookupat(dfd, name, flags | LOOKUP_REVAL,
- &nd, path);
+ retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);

if (likely(!retval))
audit_inode(name, path->dentry, flags & LOOKUP_PARENT);
@@ -2138,11 +2140,10 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,
}

/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
-static int path_parentat(int dfd, const struct filename *name,
- unsigned int flags, struct nameidata *nd,
+static int path_parentat(struct nameidata *nd, unsigned flags,
struct path *parent)
{
- const char *s = path_init(dfd, name, flags, nd);
+ const char *s = path_init(nd, flags);
int err;
if (IS_ERR(s))
return PTR_ERR(s);
@@ -2167,13 +2168,12 @@ static struct filename *filename_parentat(int dfd, struct filename *name,

if (IS_ERR(name))
return name;
- saved_nd = set_nameidata(&nd);
- retval = path_parentat(dfd, name, flags | LOOKUP_RCU, &nd, parent);
+ saved_nd = set_nameidata(&nd, dfd, name);
+ retval = path_parentat(&nd, flags | LOOKUP_RCU, parent);
if (unlikely(retval == -ECHILD))
- retval = path_parentat(dfd, name, flags, &nd, parent);
+ retval = path_parentat(&nd, flags, parent);
if (unlikely(retval == -ESTALE))
- retval = path_parentat(dfd, name, flags | LOOKUP_REVAL,
- &nd, parent);
+ retval = path_parentat(&nd, flags | LOOKUP_REVAL, parent);
if (likely(!retval)) {
*last = nd.last;
*type = nd.last_type;
@@ -2413,19 +2413,17 @@ done:

/**
* path_mountpoint - look up a path to be umounted
- * @dfd: directory file descriptor to start walk from
- * @name: full pathname to walk
- * @path: pointer to container for result
+ * @nameidata: lookup context
* @flags: lookup flags
+ * @path: pointer to container for result
*
* Look up the given name, but don't attempt to revalidate the last component.
* Returns 0 and "path" will be valid on success; Returns error otherwise.
*/
static int
-path_mountpoint(int dfd, const struct filename *name, struct path *path,
- struct nameidata *nd, unsigned int flags)
+path_mountpoint(struct nameidata *nd, unsigned flags, struct path *path)
{
- const char *s = path_init(dfd, name, flags, nd);
+ const char *s = path_init(nd, flags);
int err;
if (IS_ERR(s))
return PTR_ERR(s);
@@ -2449,12 +2447,12 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
int error;
if (IS_ERR(name))
return PTR_ERR(name);
- saved = set_nameidata(&nd);
- error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU);
+ saved = set_nameidata(&nd, dfd, name);
+ error = path_mountpoint(&nd, flags | LOOKUP_RCU, path);
if (unlikely(error == -ECHILD))
- error = path_mountpoint(dfd, name, path, &nd, flags);
+ error = path_mountpoint(&nd, flags, path);
if (unlikely(error == -ESTALE))
- error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL);
+ error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path);
if (likely(!error))
audit_inode(name, path->dentry, 0);
restore_nameidata(saved);
@@ -3215,8 +3213,7 @@ stale_open:
goto retry_lookup;
}

-static int do_tmpfile(int dfd, struct filename *pathname,
- struct nameidata *nd, int flags,
+static int do_tmpfile(struct nameidata *nd, unsigned flags,
const struct open_flags *op,
struct file *file, int *opened)
{
@@ -3224,8 +3221,7 @@ static int do_tmpfile(int dfd, struct filename *pathname,
struct dentry *child;
struct inode *dir;
struct path path;
- int error = path_lookupat(dfd, pathname,
- flags | LOOKUP_DIRECTORY, nd, &path);
+ int error = path_lookupat(nd, flags | LOOKUP_DIRECTORY, &path);
if (unlikely(error))
return error;
error = mnt_want_write(path.mnt);
@@ -3250,7 +3246,7 @@ static int do_tmpfile(int dfd, struct filename *pathname,
error = dir->i_op->tmpfile(dir, child, op->mode);
if (error)
goto out2;
- audit_inode(pathname, child, 0);
+ audit_inode(nd->name, child, 0);
/* Don't check for other permissions, the inode was just created */
error = may_open(&path, MAY_OPEN, op->open_flag);
if (error)
@@ -3275,8 +3271,8 @@ out:
return error;
}

-static struct file *path_openat(int dfd, struct filename *pathname,
- struct nameidata *nd, const struct open_flags *op, int flags)
+static struct file *path_openat(struct nameidata *nd,
+ const struct open_flags *op, unsigned flags)
{
const char *s;
struct file *file;
@@ -3290,17 +3286,17 @@ static struct file *path_openat(int dfd, struct filename *pathname,
file->f_flags = op->open_flag;

if (unlikely(file->f_flags & __O_TMPFILE)) {
- error = do_tmpfile(dfd, pathname, nd, flags, op, file, &opened);
+ error = do_tmpfile(nd, flags, op, file, &opened);
goto out2;
}

- s = path_init(dfd, pathname, flags, nd);
+ s = path_init(nd, flags);
if (IS_ERR(s)) {
put_filp(file);
return ERR_CAST(s);
}
while (!(error = link_path_walk(s, nd)) &&
- (error = do_last(nd, file, op, &opened, pathname)) > 0) {
+ (error = do_last(nd, file, op, &opened, nd->name)) > 0) {
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
s = trailing_symlink(nd);
if (IS_ERR(s)) {
@@ -3329,15 +3325,15 @@ out2:
struct file *do_filp_open(int dfd, struct filename *pathname,
const struct open_flags *op)
{
- struct nameidata nd, *saved_nd = set_nameidata(&nd);
+ struct nameidata nd, *saved_nd = set_nameidata(&nd, dfd, pathname);
int flags = op->lookup_flags;
struct file *filp;

- filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU);
+ filp = path_openat(&nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
- filp = path_openat(dfd, pathname, &nd, op, flags);
+ filp = path_openat(&nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
- filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_REVAL);
+ filp = path_openat(&nd, op, flags | LOOKUP_REVAL);
restore_nameidata(saved_nd);
return filp;
}
@@ -3360,12 +3356,12 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
if (unlikely(IS_ERR(filename)))
return ERR_CAST(filename);

- saved_nd = set_nameidata(&nd);
- file = path_openat(-1, filename, &nd, op, flags | LOOKUP_RCU);
+ saved_nd = set_nameidata(&nd, -1, filename);
+ file = path_openat(&nd, op, flags | LOOKUP_RCU);
if (unlikely(file == ERR_PTR(-ECHILD)))
- file = path_openat(-1, filename, &nd, op, flags);
+ file = path_openat(&nd, op, flags);
if (unlikely(file == ERR_PTR(-ESTALE)))
- file = path_openat(-1, filename, &nd, op, flags | LOOKUP_REVAL);
+ file = path_openat(&nd, op, flags | LOOKUP_REVAL);
restore_nameidata(saved_nd);
putname(filename);
return file;
--
2.1.4

2015-05-13 22:26:26

by Al Viro

[permalink] [raw]
Subject: [PATCH 17/21] namei: trim do_last() arguments

From: Al Viro <[email protected]>

now that struct filename is stashed in nameidata we have no need to
pass it in

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0c39113..9259fa9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2990,7 +2990,7 @@ out_dput:
*/
static int do_last(struct nameidata *nd,
struct file *file, const struct open_flags *op,
- int *opened, struct filename *name)
+ int *opened)
{
struct dentry *dir = nd->path.dentry;
int open_flag = op->open_flag;
@@ -3037,7 +3037,7 @@ static int do_last(struct nameidata *nd,
if (error)
return error;

- audit_inode(name, dir, LOOKUP_PARENT);
+ audit_inode(nd->name, dir, LOOKUP_PARENT);
/* trailing slashes? */
if (unlikely(nd->last.name[nd->last.len]))
return -EISDIR;
@@ -3066,7 +3066,7 @@ retry_lookup:
!S_ISREG(file_inode(file)->i_mode))
will_truncate = false;

- audit_inode(name, file->f_path.dentry, 0);
+ audit_inode(nd->name, file->f_path.dentry, 0);
goto opened;
}

@@ -3083,7 +3083,7 @@ retry_lookup:
* create/update audit record if it already exists.
*/
if (d_is_positive(path.dentry))
- audit_inode(name, path.dentry, 0);
+ audit_inode(nd->name, path.dentry, 0);

/*
* If atomic_open() acquired write access it is dropped now due to
@@ -3141,7 +3141,7 @@ finish_open:
path_put(&save_parent);
return error;
}
- audit_inode(name, nd->path.dentry, 0);
+ audit_inode(nd->name, nd->path.dentry, 0);
error = -EISDIR;
if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
goto out;
@@ -3296,7 +3296,7 @@ static struct file *path_openat(struct nameidata *nd,
return ERR_CAST(s);
}
while (!(error = link_path_walk(s, nd)) &&
- (error = do_last(nd, file, op, &opened, nd->name)) > 0) {
+ (error = do_last(nd, file, op, &opened)) > 0) {
nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
s = trailing_symlink(nd);
if (IS_ERR(s)) {
--
2.1.4

2015-05-13 22:30:54

by Al Viro

[permalink] [raw]
Subject: [PATCH 18/21] inline user_path_parent()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9259fa9..995b56e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2310,7 +2310,7 @@ EXPORT_SYMBOL(user_path_at);
* allocated by getname. So we must hold the reference to it until all
* path-walking is complete.
*/
-static struct filename *
+static inline struct filename *
user_path_parent(int dfd, const char __user *path,
struct path *parent,
struct qstr *last,
--
2.1.4

2015-05-13 22:26:59

by Al Viro

[permalink] [raw]
Subject: [PATCH 19/21] inline user_path_create()

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 995b56e..23dd9e8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3455,7 +3455,7 @@ void done_path_create(struct path *path, struct dentry *dentry)
}
EXPORT_SYMBOL(done_path_create);

-struct dentry *user_path_create(int dfd, const char __user *pathname,
+inline struct dentry *user_path_create(int dfd, const char __user *pathname,
struct path *path, unsigned int lookup_flags)
{
return filename_create(dfd, getname(pathname), path, lookup_flags);
--
2.1.4

2015-05-13 22:29:19

by Al Viro

[permalink] [raw]
Subject: [PATCH 20/21] namei: move saved_nd pointer into struct nameidata

From: Al Viro <[email protected]>

these guys are always declared next to each other; might as well put
the former (pointer to previous instance) into the latter and simplify
the calling conventions for {set,restore}_nameidata()

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 46 ++++++++++++++++++++++++----------------------
1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 23dd9e8..dead60d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -498,10 +498,8 @@ struct nameidata {
struct qstr last;
struct path root;
struct inode *inode; /* path.dentry.d_inode */
- struct filename *name;
- int dfd;
unsigned int flags;
- unsigned seq, m_seq, root_seq;
+ unsigned seq, m_seq;
int last_type;
unsigned depth;
int total_link_count;
@@ -512,23 +510,26 @@ struct nameidata {
struct inode *inode;
unsigned seq;
} *stack, internal[EMBEDDED_LEVELS];
+ struct filename *name;
+ struct nameidata *saved;
+ unsigned root_seq;
+ int dfd;
};

-static struct nameidata *set_nameidata(struct nameidata *p, int dfd,
- struct filename *name)
+static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
{
struct nameidata *old = current->nameidata;
p->stack = p->internal;
p->dfd = dfd;
p->name = name;
p->total_link_count = old ? old->total_link_count : 0;
+ p->saved = old;
current->nameidata = p;
- return old;
}

-static void restore_nameidata(struct nameidata *old)
+static void restore_nameidata(void)
{
- struct nameidata *now = current->nameidata;
+ struct nameidata *now = current->nameidata, *old = now->saved;

current->nameidata = old;
if (old)
@@ -2118,14 +2119,14 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,
struct path *path, struct path *root)
{
int retval;
- struct nameidata nd, *saved_nd;
+ struct nameidata nd;
if (IS_ERR(name))
return PTR_ERR(name);
- saved_nd = set_nameidata(&nd, dfd, name);
if (unlikely(root)) {
nd.root = *root;
flags |= LOOKUP_ROOT;
}
+ set_nameidata(&nd, dfd, name);
retval = path_lookupat(&nd, flags | LOOKUP_RCU, path);
if (unlikely(retval == -ECHILD))
retval = path_lookupat(&nd, flags, path);
@@ -2134,7 +2135,7 @@ static int filename_lookup(int dfd, struct filename *name, unsigned flags,

if (likely(!retval))
audit_inode(name, path->dentry, flags & LOOKUP_PARENT);
- restore_nameidata(saved_nd);
+ restore_nameidata();
putname(name);
return retval;
}
@@ -2164,11 +2165,11 @@ static struct filename *filename_parentat(int dfd, struct filename *name,
struct qstr *last, int *type)
{
int retval;
- struct nameidata nd, *saved_nd;
+ struct nameidata nd;

if (IS_ERR(name))
return name;
- saved_nd = set_nameidata(&nd, dfd, name);
+ set_nameidata(&nd, dfd, name);
retval = path_parentat(&nd, flags | LOOKUP_RCU, parent);
if (unlikely(retval == -ECHILD))
retval = path_parentat(&nd, flags, parent);
@@ -2182,7 +2183,7 @@ static struct filename *filename_parentat(int dfd, struct filename *name,
putname(name);
name = ERR_PTR(retval);
}
- restore_nameidata(saved_nd);
+ restore_nameidata();
return name;
}

@@ -2443,11 +2444,11 @@ static int
filename_mountpoint(int dfd, struct filename *name, struct path *path,
unsigned int flags)
{
- struct nameidata nd, *saved;
+ struct nameidata nd;
int error;
if (IS_ERR(name))
return PTR_ERR(name);
- saved = set_nameidata(&nd, dfd, name);
+ set_nameidata(&nd, dfd, name);
error = path_mountpoint(&nd, flags | LOOKUP_RCU, path);
if (unlikely(error == -ECHILD))
error = path_mountpoint(&nd, flags, path);
@@ -2455,7 +2456,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path,
error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path);
if (likely(!error))
audit_inode(name, path->dentry, 0);
- restore_nameidata(saved);
+ restore_nameidata();
putname(name);
return error;
}
@@ -3325,23 +3326,24 @@ out2:
struct file *do_filp_open(int dfd, struct filename *pathname,
const struct open_flags *op)
{
- struct nameidata nd, *saved_nd = set_nameidata(&nd, dfd, pathname);
+ struct nameidata nd;
int flags = op->lookup_flags;
struct file *filp;

+ set_nameidata(&nd, dfd, pathname);
filp = path_openat(&nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
filp = path_openat(&nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(&nd, op, flags | LOOKUP_REVAL);
- restore_nameidata(saved_nd);
+ restore_nameidata();
return filp;
}

struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
const char *name, const struct open_flags *op)
{
- struct nameidata nd, *saved_nd;
+ struct nameidata nd;
struct file *file;
struct filename *filename;
int flags = op->lookup_flags | LOOKUP_ROOT;
@@ -3356,13 +3358,13 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
if (unlikely(IS_ERR(filename)))
return ERR_CAST(filename);

- saved_nd = set_nameidata(&nd, -1, filename);
+ set_nameidata(&nd, -1, filename);
file = path_openat(&nd, op, flags | LOOKUP_RCU);
if (unlikely(file == ERR_PTR(-ECHILD)))
file = path_openat(&nd, op, flags);
if (unlikely(file == ERR_PTR(-ESTALE)))
file = path_openat(&nd, op, flags | LOOKUP_REVAL);
- restore_nameidata(saved_nd);
+ restore_nameidata();
putname(filename);
return file;
}
--
2.1.4

2015-05-13 22:27:03

by Al Viro

[permalink] [raw]
Subject: [PATCH 21/21] turn user_{path_at,path,lpath,path_dir}() into static inlines

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
fs/namei.c | 8 +-------
include/linux/namei.h | 34 ++++++++++++++++++++++++----------
2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dead60d..8ca1fbd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2297,13 +2297,7 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
return filename_lookup(dfd, getname_flags(name, flags, empty),
flags, path, NULL);
}
-
-int user_path_at(int dfd, const char __user *name, unsigned flags,
- struct path *path)
-{
- return user_path_at_empty(dfd, name, flags, path, NULL);
-}
-EXPORT_SYMBOL(user_path_at);
+EXPORT_SYMBOL(user_path_at_empty);

/*
* NB: most callers don't do anything directly with the reference to the
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 1208e48..d8c6334 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -1,12 +1,10 @@
#ifndef _LINUX_NAMEI_H
#define _LINUX_NAMEI_H

-#include <linux/dcache.h>
-#include <linux/errno.h>
-#include <linux/linkage.h>
+#include <linux/kernel.h>
#include <linux/path.h>
-
-struct vfsmount;
+#include <linux/fcntl.h>
+#include <linux/errno.h>

enum { MAX_NESTED_LINKS = 8 };

@@ -46,13 +44,29 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_ROOT 0x2000
#define LOOKUP_EMPTY 0x4000

-extern int user_path_at(int, const char __user *, unsigned, struct path *);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);

-#define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path)
-#define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path)
-#define user_path_dir(name, path) \
- user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, path)
+static inline int user_path_at(int dfd, const char __user *name, unsigned flags,
+ struct path *path)
+{
+ return user_path_at_empty(dfd, name, flags, path, NULL);
+}
+
+static inline int user_path(const char __user *name, struct path *path)
+{
+ return user_path_at_empty(AT_FDCWD, name, LOOKUP_FOLLOW, path, NULL);
+}
+
+static inline int user_lpath(const char __user *name, struct path *path)
+{
+ return user_path_at_empty(AT_FDCWD, name, 0, path, NULL);
+}
+
+static inline int user_path_dir(const char __user *name, struct path *path)
+{
+ return user_path_at_empty(AT_FDCWD, name,
+ LOOKUP_FOLLOW | LOOKUP_DIRECTORY, path, NULL);
+}

extern int kern_path(const char *, unsigned, struct path *);

--
2.1.4

2015-05-14 01:39:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Wed, May 13, 2015 at 3:25 PM, Al Viro <[email protected]> wrote:
> More on top of the current vfs.git#for-next (== the posted patchset
> with a couple of fixes): more fs/namei.c reorganization and stack footprint
> reduction (below 1Kb now). One interesting piece of that is that we don't
> touch current->fs->lock anymore - unlazy_walk() used to, but now we can
> get rid of that.

Ok. I don't see anything wrong here, but I have to admit that I'm also
at the point where I go "maybe this area should calm down a bit", and
where I'd prefer to not see more long patch-series. Even if most of
the patches seem to be fairly mechanical code movement and cleanup and
preparation, mistakes happen, and I just get worried.

So I think the series is good, but in particular if you're planning on
some more core changes (ie your "act on filename" callback thing), I
would really prefer that we stop at this point for the 4.2 window, and
make sure it's all stable.

And then your callback thing could be for 4.3.

That said, I'm not entirely convinced about a callback approach for
stat() and friends. I suspect only stat() is really critical enough to
warrant the whole "let's do it all in RCU mode", and if there's only
one case, then there's no need for the (*act) indirection - might as
well hardcode it.

But feel free to convince me. Again, I'd really prefer that to be
after the current work has been in a stable release and a well tested
base, though.

Linus

2015-05-14 03:30:44

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Wed, May 13, 2015 at 06:39:53PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 3:25 PM, Al Viro <[email protected]> wrote:
> > More on top of the current vfs.git#for-next (== the posted patchset
> > with a couple of fixes): more fs/namei.c reorganization and stack footprint
> > reduction (below 1Kb now). One interesting piece of that is that we don't
> > touch current->fs->lock anymore - unlazy_walk() used to, but now we can
> > get rid of that.
>
> Ok. I don't see anything wrong here, but I have to admit that I'm also
> at the point where I go "maybe this area should calm down a bit", and
> where I'd prefer to not see more long patch-series. Even if most of
> the patches seem to be fairly mechanical code movement and cleanup and
> preparation, mistakes happen, and I just get worried.
>
> So I think the series is good, but in particular if you're planning on
> some more core changes (ie your "act on filename" callback thing), I
> would really prefer that we stop at this point for the 4.2 window, and
> make sure it's all stable.

*nod*

It had been a fun three weeks, but at that point all low-hanging fruits
are gone. There is more stuff visible there (lazy stat(), offloading
automounts via schedule_work(), perhaps RCU handling of non-fast symlinks),
but that'll take a lot more serious plotting[1] before it gets to the
stage when it can be implemented. In particular, automounts will require
discussing what exactly in the process' state is used for those - both
with autofs/NFS/AFS/CIFS folks and with Eric (what netns should be used
when we are crossing an NFSv4 referral point? Should it come from the
NFS mount we'd found the referral on, or from the process that has run
across it? There'd been a series from Ian around the interplay of
autofs with namespaces, and IIRC it stepped into similar-sounding areas;
it'll need to be looked into, etc.)

I doubt that we'll get it sorted out before 4.2, and this kind of stuff
*does* need exposure in -testing, as well as public review, so such
continuations are clear 4.3 fodder. I hope to get it figured out by the
time 4.3-rc1 comes, so that it could be dealt with early in the next cycle,
but for now I'm planning to pull the current #link_path_walk into #for-next
and let it soak there. Especially since there are other pending piles of
joy elsewhere - handling of d_move()-messed vfsmounts, for starters ;-/

> And then your callback thing could be for 4.3.
>
> That said, I'm not entirely convinced about a callback approach for
> stat() and friends. I suspect only stat() is really critical enough to
> warrant the whole "let's do it all in RCU mode", and if there's only
> one case, then there's no need for the (*act) indirection - might as
> well hardcode it.

Maybe... I'd like to see the profiles, TBH - especially getxattr() and
access() frequency on various loads. Sure, make(1) and cc(1) really care
about stat() very much, but I wouldn't be surprised if something like
httpd or samba would be hitting getxattr() a lot...

> But feel free to convince me. Again, I'd really prefer that to be
> after the current work has been in a stable release and a well tested
> base, though.

No arguments here...

[1] as in "which shitpiles are we likely to step into on the way there and
how far would detours take us"

2015-05-14 03:53:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Wed, May 13, 2015 at 8:30 PM, Al Viro <[email protected]> wrote:
>
> Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> access() frequency on various loads. Sure, make(1) and cc(1) really care
> about stat() very much, but I wouldn't be surprised if something like
> httpd or samba would be hitting getxattr() a lot...

So I haven't seen samba profiles in ages, but iirc we have more
serious problems than trying to speed up basic filename lookup.

At least long long ago, inode semaphore contention was a big deal,
largely due to readdir(). And readdir() itself, for that matter - we
have no good vfs-level readdir caching, so it all ends up serialized
on the inode semaphore, and it all goes all the way into the
filesystem to get the readdir data. And at least for ext4, readdir()
is slow anyway, because it doesn't use the page cache, it uses that
good old buffer cache, because of how ext4 does metadata journaling
etc.

Having readdir() caching at the VFS layer would likely be a really
good thing, but it's hard. It *might* be worth looking at the nfs4
code to see if we could possibly move some of that code into the vfs
layer, but the answer is likely "no", or at least "that's incredibly
painful".

Also, I *think* samba ends up basically doing most of the pathname
lookups from its own user-level cache, because of how it needs to
avoid symlinks and do the whole crazy case insensitive pathname thing.
I have this very dim memory that that's one reason samba ends up being
so readdir-intensive. But I might be wrong, it's been many years since
I talked to anybody about samba, and I don't run it myself.

But yes, that would likely be a very worthy profile target. I suspect
we've beaten the "make does lots of stat() calls" horse pretty much to
death.

Linus

2015-05-14 11:23:48

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 8:30 PM, Al Viro <[email protected]> wrote:
> >
> > Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> > access() frequency on various loads. Sure, make(1) and cc(1) really care
> > about stat() very much, but I wouldn't be surprised if something like
> > httpd or samba would be hitting getxattr() a lot...
>
> So I haven't seen samba profiles in ages, but iirc we have more
> serious problems than trying to speed up basic filename lookup.
>
> At least long long ago, inode semaphore contention was a big deal,
> largely due to readdir().

It still is - it's the prime reason people still need to create
hashed directory structures so that they can get concurrency in
directory operations. IMO, concurrency in directory operations is a
more important problem to solve than worrying about readdir speed;
in large filesystems readdir and lookup are IO bound operations and
so everything serialises on the IO as it's done with the i_mutex
held....

> And readdir() itself, for that matter - we have no good vfs-level
> readdir caching, so it all ends up serialized on the inode
> semaphore, and it all goes all the way into the filesystem to get
> the readdir data. And at least for ext4, readdir()
> is slow anyway, because it doesn't use the page cache, it uses
> that good old buffer cache, because of how ext4 does metadata
> journaling etc.

IIRC, ext4 readdir is not slow because of the use of the buffer
cache, it's slow because of the way it hashes dirents across blocks
on disk. i.e. it has locality issues, not a caching problem.

> Having readdir() caching at the VFS layer would likely be a really
> good thing, but it's hard. It *might* be worth looking at the nfs4
> code to see if we could possibly move some of that code into the vfs
> layer, but the answer is likely "no", or at least "that's incredibly
> painful".

Maybe I'm missing something - what operation would be sped up by
caching readdir data? Are you trying to optimise the ->lookup that
tends to follow readdir by caching individual dirents? Or something
else?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-05-14 12:46:56

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu 14-05-15 21:23:04, Dave Chinner wrote:
> On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> > And readdir() itself, for that matter - we have no good vfs-level
> > readdir caching, so it all ends up serialized on the inode
> > semaphore, and it all goes all the way into the filesystem to get
> > the readdir data. And at least for ext4, readdir()
> > is slow anyway, because it doesn't use the page cache, it uses
> > that good old buffer cache, because of how ext4 does metadata
> > journaling etc.
>
> IIRC, ext4 readdir is not slow because of the use of the buffer
> cache, it's slow because of the way it hashes dirents across blocks
> on disk. i.e. it has locality issues, not a caching problem.
For ext4 readdir is just a linear read of the directory. Linus is right
we store directory blocks in buffer cache but we do our own readahead on
directory blocks so I don't think much slowness comes from that. One
thing that is slowing us down is that we don't do preallocation for
directories so they often end up being fragmented a lot.

The locality problem you are probably referring to is that readdir on ext4
returns directory entries in hash order. That is different from the
ordering by inode number which is optimal for the following cache-cold stat
/ unlink / whatever you want to do with inodes. This causes big performance
issues e.g. if you do rm -rf on large directory hierarchy. But you don't
see that often these days as lots of utilities have learned to workaround
ext4 problems by sorting directory entries by inode number before doing
anything with them.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-14 15:51:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 4:23 AM, Dave Chinner <[email protected]> wrote:
>
> IIRC, ext4 readdir is not slow because of the use of the buffer
> cache, it's slow because of the way it hashes dirents across blocks
> on disk. i.e. it has locality issues, not a caching problem.

No, you're just worrying about IO. Natural for a filesystem guy, but a
lot of loads cache really well, and IO isn't an issue. Yes, there's a
bad cold-cache case, but that's not when you get inode semaphore
contention. You get contention when you have lots of concurrent
accesses to the same directory, and then the data is all nice and hot
in the caches. But readdir() _still_ sucks donkey ass by the
bucket-load for that case.

And that's the case I'm talking about. Using the buffer cache for
readdir() is a complete disaster, because it means that

(a) you have to go down to the filesystem, wasting CPU resources, and
more importantly, going into code that by definition hasn't been
optimized as well and cannot ever be, because it's not common code
that everybody sees.

(b) you have to look up the physical block number, wasting even
*more* CPU resources, because the buffer heads are physically indexed

(c) you then use the buffer head lookup, which itself isn't horrible,
but it's not as well optimized as the page cache is.

(d) and because we call into the filesystem, not only is the code not
getting as much attention as the vfs layer, we generally can't trust
filesystem guys to get locking right (because 90% of the filesystems
don't get the attention they need even _without_ locking, and the 10%
that does is maintained by people who worry mainly about IO). So the
VFS layer has no real choice except to use a big-hammer "lock the
whole damn directory" approach.

End result: readdir() wastes a *lot* of time on stupid stuff (just
that physical block number lookup is generally more expensive than
readdir itself should be), and it does so with excessive locking,
serializing everything.

Both readdir() and path component lookup are technically read
operations, so why the hell do we use a mutex, rather than just get a
read-write lock for reading? Yeah, it's that (d) above. I might trust
xfs and ext4 to get their internal exclusions for allocations etc
right when called concurrently for the same directory. But the others?

I saw you talk about how the aio IO paths are "better" than the
regular page cache paths just a few days ago (when talking about
persistent memory). You're completely and utterly out to lunch,
*especially* with things like persistent memory, where the IO paths
wouldn't even *exist*, because things never get out of the cache. And
that out to lunch on this comes from your total fixation with IO. The
page cache is one studly mf in the normal cases when things are
cached, BUT YOU NEVER EVEN SEE THAT. Why? Because your filesystem code
never gets called for it, and the page cache ends up having almost
perfect behavior. It scales perfectly, and it scales with good
performance.

I understand where you are coming from, but caching really really
works. You ignore that, because you don't see those things, and the
caching case never affects you.

The readdir path? It sucks. And it sucks exactly because it's done in
the filesystem, and not in some VFS caches that we could actually make
go fast. We can't cache it well.

Basically, in computer science, pretty much all performance work is
about caching. And readdir is the one area where the VFS layer doesn't
do well, falls on its face and punts back to the filesystem.

Linus

2015-05-14 16:37:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

Al Viro <[email protected]> writes:

> In particular, automounts will require
> discussing what exactly in the process' state is used for those - both
> with autofs/NFS/AFS/CIFS folks and with Eric (what netns should be used
> when we are crossing an NFSv4 referral point? Should it come from the
> NFS mount we'd found the referral on, or from the process that has run
> across it? There'd been a series from Ian around the interplay of
> autofs with namespaces, and IIRC it stepped into similar-sounding areas;
> it'll need to be looked into, etc.)

Almost certainly it should be the netns of NFS mount the referral
is on.

Looking at processes causes all kinds of caching problems, and is only
really appropriate when we are looking at something deliberate like
magic symlinks (i.e. /proc/self).

Automounts are the one case where I can image capturing the current
netns during mounting of the filesystem could be an issue.

Eric

2015-05-14 17:02:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 8:51 AM, Linus Torvalds
<[email protected]> wrote:
>
> Basically, in computer science, pretty much all performance work is
> about caching.

Credit where credit is due. Terje "almost all programming can be
viewed as an exercise in caching" Mathisen.

Linus

2015-05-14 22:09:39

by Jeremy Allison

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 8:30 PM, Al Viro <[email protected]> wrote:
> >
> > Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> > access() frequency on various loads. Sure, make(1) and cc(1) really care
> > about stat() very much, but I wouldn't be surprised if something like
> > httpd or samba would be hitting getxattr() a lot...
>
> So I haven't seen samba profiles in ages, but iirc we have more
> serious problems than trying to speed up basic filename lookup.

Let me know what you need :-).

>
> Also, I *think* samba ends up basically doing most of the pathname
> lookups from its own user-level cache, because of how it needs to
> avoid symlinks and do the whole crazy case insensitive pathname thing.
> I have this very dim memory that that's one reason samba ends up being
> so readdir-intensive. But I might be wrong, it's been many years since
> I talked to anybody about samba, and I don't run it myself.

You should, it's very good these days :-). We have to
walk the directory trees to do the case insensitive pathname
lookups, but we cache these so when we get an incoming
pathname we look it up in the cache, see if we can
stat the looked up name and if so we're done with pathname
resolution.

Of course we tell people to just set their filesystems
up using mkfs.xfs -n version=ci :-).

2015-05-14 23:22:27

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3 074/110] VFS: replace {, total_}link_count in task_struct with pointer to nameidata

On Mon, May 11, 2015 at 07:07:34PM +0100, Al Viro wrote:
> From: NeilBrown <[email protected]>
>
> task_struct currently contains two ad-hoc members for use by the VFS:
> link_count and total_link_count. These are only interesting to fs/namei.c,
> so exposing them explicitly is poor layering. Incidentally, link_count
> isn't used anymore, so it can just die.
>
> This patches replaces those with a single pointer to 'struct nameidata'.
> This structure represents the current filename lookup of which
> there can only be one per process, and is a natural place to
> store total_link_count.
>
> This will allow the current "nameidata" argument to all
> follow_link operations to be removed as current->nameidata
> can be used instead in the _very_ few instances that care about
> it at all.
>
> As there are occasional circumstances where pathname lookup can
> recurse, such as through kern_path_locked, we always save and old
> current->nameidata (if there is one) when setting a new value, and
> make sure any active link_counts are preserved.
>
> follow_mount and follow_automount now get a 'struct nameidata *'
> rather than 'int flags' so that they can directly access
> total_link_count, rather than going through 'current'.

FWIW, the mainline semantics for total_link_count is bogus. Look: we set
it to zero in path_init() (since _way_ back - in fact, 2.4.12 had something like
that in path_walk()). Now, consider what happens when something like e.g.
NFS referral point crossing does pathname resolution (recursion there is
limited to "no more than once" by referral loop prevention logics in NFS code).
We get the damn thing quietly reset to zero, no matter how many symlinks had
already been crossed.

IMO we should drop that assignment in path_init() and let the logics in
set_nameidata/restore_nameidata do the right thing. The current semantics
is obviously wrong; we should either count the symlink traversals during
a referral resolution towards the limit or just limit those to 40 independently
from the symlink traversals outside of referral resolution, but resetting the
count to zero as soon as we'd hit the referral is clearly bogus.

I would prefer the "let's count them towards the limit" variant. Objections?

2015-05-14 23:24:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison <[email protected]> wrote:
>
> Of course we tell people to just set their filesystems
> up using mkfs.xfs -n version=ci :-).

So ASCII-only case-insensitivity is sufficient for you guys?

Doing case-insensitive lookups at a vfs layer level wouldn't be
impossible (add some new lookup flag, so it would *not* be
per-filesystem, it would be per-operation!), but the *full*
case-insensitivity space in utf-8 is too much to expect. Especially
since different people have different opinions on what it even should
be.

What else is problematic? I think you want an error on symlinks in the
middle, right? So that you can do those manually? I also assume you
don't like to follow ".." due to containment issues?

Adding (again per-lookup) flags for "no symlinks" and "no dotdot")
would be trivial (much more so than the case insensitivity). Would you
require "error out on non-ascii characters" too, to then handle the
complex cases by hand?

Obviously per-lookup flags means that you can't just use "stat()", it
would be "fstatat()".

I dunno. But it *may* be worth it to really try to give samba what it
wants. Of course, if samba is happy doing all the name caching in user
space, then that's not worth worrying about.

And the reason I don't use samba myself is that I'm not a fan of
network filesystems. I want my filesystems low-latency and right there
on the local ssd, thank you very much. But if you have some
local-machine benchmarkign thing you use, I guess I could use that to
see what the profile looks like...

Linus

2015-05-14 23:37:04

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:

> So ASCII-only case-insensitivity is sufficient for you guys?
>
> Doing case-insensitive lookups at a vfs layer level wouldn't be
> impossible (add some new lookup flag, so it would *not* be
> per-filesystem, it would be per-operation!),

ENOPARSE. Either two names are equivalent or they are not; it's not a
per-operation thing. What do you mean?

2015-05-14 23:57:29

by Jeremy Allison

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison <[email protected]> wrote:
> >
> > Of course we tell people to just set their filesystems
> > up using mkfs.xfs -n version=ci :-).
>
> So ASCII-only case-insensitivity is sufficient for you guys?

No it's not enough really. But for specific Windows apps that
use restricted namespaces (and there are such) it works.

ZFS on *BSD does do full case-insenitive lookups (utf8) as part of
FreeNAS. I think if it's configured a SMB-only share they turn
that on.

> Doing case-insensitive lookups at a vfs layer level wouldn't be
> impossible (add some new lookup flag, so it would *not* be
> per-filesystem, it would be per-operation!), but the *full*
> case-insensitivity space in utf-8 is too much to expect. Especially
> since different people have different opinions on what it even should
> be.

Yeah, Apple are the real sinners here. utf8-compose-characters - pah !

> What else is problematic? I think you want an error on symlinks in the
> middle, right? So that you can do those manually? I also assume you
> don't like to follow ".." due to containment issues?

We already strip . and .. out of incoming pathnames. Once we've
walked the path (following links) we then use realpath() to ensure
the full path is under the exported share "path =" directive.

> Adding (again per-lookup) flags for "no symlinks" and "no dotdot")
> would be trivial (much more so than the case insensitivity). Would you
> require "error out on non-ascii characters" too, to then handle the
> complex cases by hand?

Don't need the dot-dot stuff - "no symlinks" would be useful,
but you have to remember Samba is still portable to *BSD and
Solaris-clones (all that anyone really cares about these days)
so we'll have to keep the old code paths too.

> I dunno. But it *may* be worth it to really try to give samba what it
> wants. Of course, if samba is happy doing all the name caching in user
> space, then that's not worth worrying about.

Case insensitive pathname lookup is the one I remember
Windows sales guys beating us up on benchmarks (create 1,000,000 files
in a directory and then have a client that looks for a file
that *doesn't* exist :-).

Hopefully Volker is also on this thread and can chime in with
some requirements of his own :-).

> And the reason I don't use samba myself is that I'm not a fan of
> network filesystems. I want my filesystems low-latency and right there
> on the local ssd, thank you very much. But if you have some
> local-machine benchmarkign thing you use, I guess I could use that to
> see what the profile looks like...

You don't know what you're missing mate !

I use it every day via my local NAS to play music (via SONOS
Linux clients) and movies/TV (SageTV Linux boxes) everywhere
in the house. I don't even have a Windows box (other than
VM's for testing) anywhere.

But my house is fully wired for gigabit, I wouldn't do that over
wireless :-).

Samba, it's not just for Windows anymore :-).

2015-05-15 00:25:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 4:36 PM, Al Viro <[email protected]> wrote:
> On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
>
>> So ASCII-only case-insensitivity is sufficient for you guys?
>>
>> Doing case-insensitive lookups at a vfs layer level wouldn't be
>> impossible (add some new lookup flag, so it would *not* be
>> per-filesystem, it would be per-operation!),
>
> ENOPARSE. Either two names are equivalent or they are not; it's not a
> per-operation thing. What do you mean?

We can easily make things per-operation, by adding another flag. We
already have per-operation flags like LOOKUP_FOLLOW, which decides if
we follow the last symlink or not. We could add a LOOKUP_ICASE, which
decides whether we compare case or not. Obviously, we'd have to ad the
proper O_ICASE for open (and AT_ICASE for fstatat() and friends).
Exactly like we do for LOOKUP_FOLLOW.

HOWEVER.

The reason ASCII-only matters is two-fold:

(a) hashing needs to work, and hash all equivalent names to the same
bucket. And we need to hash the same *regardless* of whether the
operation was done with ICASE or not.

With ASCII, this is fairly easy: we could easily make the hashing
just mask bit 5 in each byte, and that wouldn't slow us down at all,
and it would hardly change the hash effectiveness either. m

In particular, with ASCII, we can trivially still do the
word-at-a-time hashing. So there's fairly little downside.

(b) The *compare* needs to work too. In particular, right now we very
much try to avoid comparing the names by checking both the full hash
and the name length. Again, that's fine with ASCII - two names that
differ in case are the same length.

And again, we can still use the word-at-a-time compare, just have
a mask (and at compare time, we can make the mask depend on ICASE).
Sure, you'll still have to do a more careful compare (becaue
case-insensitivity is not *just* "same except for bit 5 even in
ASCII), but we can trivially have a ICASE test up front, and keep the
fast case exactly the same as before.

Now, doing full UTF-8 is *much* harder. Part of it is that outside of
ASCII, you literally have cases that are ambiguous. Part of it is that
outside of ASCII, now the lengths aren't even guaranteed to match. And
part of it is that now you have to do things that are much more
complex than just masking bits in parallel for multiple bytes at the
same time (although you can still have a fast-path that depends on
just masking the high bit, to at least say "this is just the ASCII
subcase").

But doing ASCII ICASE compares wouldn't be that hard, and wouldn't
affect performance.

Btw, don't get me wrong. I'm not saying it's a great idea. I think
icase compares are stupid. Really really stupid. But samba might be
worth jumping though a few hoops for. The real problem is that even
with just ASCII, it does make it much easier to create nasty hash
collisions in the dentry hashes (same hash from 256 variations of
aAaAAaaA - just repeat the same letter in different variations of
lower/upper case).

So even plain ASCII icase has some real problems. But it's
conceptually not that hard. True UTF-8 icase? That's an absolute
*nightmare*, and causes serious problems. OS X got it very very wrong,
for example, by messing up the normalization.

Linus

2015-05-15 01:26:54

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 05:25:39PM -0700, Linus Torvalds wrote:

> We can easily make things per-operation, by adding another flag. We
> already have per-operation flags like LOOKUP_FOLLOW, which decides if
> we follow the last symlink or not. We could add a LOOKUP_ICASE, which
> decides whether we compare case or not. Obviously, we'd have to ad the
> proper O_ICASE for open (and AT_ICASE for fstatat() and friends).
> Exactly like we do for LOOKUP_FOLLOW.

> Btw, don't get me wrong. I'm not saying it's a great idea. I think
> icase compares are stupid. Really really stupid. But samba might be
> worth jumping though a few hoops for. The real problem is that even
> with just ASCII, it does make it much easier to create nasty hash
> collisions in the dentry hashes (same hash from 256 variations of
> aAaAAaaA - just repeat the same letter in different variations of
> lower/upper case).

Hold on. Should
stat("blah", &buf) => ENOENT, OK, let's create it
mkdir("blah", 0) => EEXIST, bugger, looks like a race
stat("blah", &buf) => ENOENT, Whiskey, Tango, Foxtrot
be possible? No per-operation flags passed, doesn't even know of the
case-insensitive crap. And if fstatat() without your new flag would
find c-i matches, then what does that flag do?

Confused...

2015-05-15 02:18:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 6:26 PM, Al Viro <[email protected]> wrote:
>
> Hold on. Should
> stat("blah", &buf) => ENOENT, OK, let's create it
> mkdir("blah", 0) => EEXIST, bugger, looks like a race
> stat("blah", &buf) => ENOENT, Whiskey, Tango, Foxtrot
> be possible?

No. What I described would not in any way change any of the above. I'm
not understanding what your point is.

The only difference - EVER - would be if you pass in the ICASE flag.
Nothing I suggested would change semantics without it (the _hash_
changes, but that doesn't change semantics, it's a purely internal
random number).

Now, *with* O_ICASE/AT_ICASE, semantics change. Obviously. At that
point the dentry lookup would match case-insensitively.

For example, let's say that you have a directory where you already
have both "Blah" and "blah", because you created them in a sane
environment. They'll be two different dentries (assuming they are
cached), but they'll have the same dentry hash.

Now, you open "blah" with O_ICASE, and the end result is that you
would randomly open one or the other (it would be the one you find
first on the hash chain). Tough. Mixing icase and case-insensitive is
by definition going to cause those kinds of issues.

The nasty issue (and the case that samba apparently wants it for) is
that ICASE wouldn't be able to trust negative dentries (us having a
negative dentry in one case doesn't mean that it's negative in ICASE).
And that might be the killer part. Negative dentries are really
useful.

Now, the VFS layer support part is I think fairly simple. I might be
wrong, but I really think the hashing etc wouldn't be too painful.
After all, we already do support ->d_hash() and ->d_compare(), this is
"more of the same", just supported at a vfs level directly (and
_allowing_ aliases in case).

The real pain is that the low-level filesystem has to support it too.
That's simple for some filesystems, but it can be hard for things that
hash filenames. Because there - unlike at the VFS layer - the hashes
have meaning and you can't just change them to suit a ICASE lookup
(because they exist on-disk).

So supporting that is likely trivial on filesystems like FAT or SYSV,
which just iterate over the directory anyway at lookup() time. On ext*
with hashed directories, it's nasty (and a ICASE lookup would probably
have to just walk the whole directory. old-style). But I think all the
code to do the nonhashed lookup is still there, since it is a
filesystem feature bit. And it would only need to do that linear
search thing when the ICASE flag is set in the lookup flags.

Of course, if it ends up just walking the directory linearly anyway,
it doesn't fix the one samba performance problem that Jeremy pointed
out, so that makes this of dubious value. If we can't do this better
than samba can already do it on its own, it's kind of pointless.

Again - the filesystems (and the vfs layer) would remain case
sensitive. But I think it might be fairly straightforward to allow
per-operation ICASE handling for thins that want it.

Keyword "think". Maybe there's something I didn't think of.

Linus

2015-05-15 02:51:10

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 07:18:16PM -0700, Linus Torvalds wrote:
> The only difference - EVER - would be if you pass in the ICASE flag.
> Nothing I suggested would change semantics without it (the _hash_
> changes, but that doesn't change semantics, it's a purely internal
> random number).
>
> Now, *with* O_ICASE/AT_ICASE, semantics change. Obviously. At that
> point the dentry lookup would match case-insensitively.
>
> For example, let's say that you have a directory where you already
> have both "Blah" and "blah", because you created them in a sane
> environment. They'll be two different dentries (assuming they are
> cached), but they'll have the same dentry hash.
>
> Now, you open "blah" with O_ICASE, and the end result is that you
> would randomly open one or the other (it would be the one you find
> first on the hash chain). Tough. Mixing icase and case-insensitive is
> by definition going to cause those kinds of issues.

With c-i mount, unpacking a tarball with tar(1) will not (silently, at that)
create a situaiton when your c-i users will get lookups proceeding into
randomly picked variant of directory, with variants differing only in
case. It will do the right thing and put the files where they would be
expected, giving an expected "it already exists" if you try to create
a directory with the name that matches that of existing file, etc.
With this, OTOH, you'll have to use specialized tools for creating
files in that tree, or risk random lossage, because creating /mnt/foo/bar
when /mnt/Foo/Bugger already existed will succeed just fine, leaving one
hell of a mess for c-i users.

What's the benefit compared to c-i mount? Not hitting filesystem's
->d_hash() and ->d_compare()?

2015-05-15 05:07:48

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3 105/110] namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

On Tue, May 12, 2015 at 05:10:01AM +0100, Al Viro wrote:

> +static int unlazy_link(struct nameidata *nd, struct path *link, unsigned seq)
> +{
> + if (unlikely(!legitimize_path(nd, link, seq))) {
> + drop_links(nd);
> + rcu_read_unlock();
> + nd->flags &= ~LOOKUP_RCU;
> + nd->path.mnt = NULL;
> + nd->path.dentry = NULL;
> + if (!(nd->flags & LOOKUP_ROOT))
> + nd->root.mnt = NULL;

... and nd->depth should be set to 0, to avoid bogus path_put() on the
stuff in nd->stack[...].link when we get to terminate_walk(). Fixed and
folded.

> + } else if (likely(unlazy_walk(nd, NULL, 0)) == 0) {
> + return 0;
> + }
> + path_put(link);
> + return -ECHILD;
> +}
> +
> static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
> {
> return dentry->d_op->d_revalidate(dentry, flags);
> @@ -1537,20 +1613,6 @@ static inline int handle_dots(struct nameidata *nd, int type)
> return 0;
> }
>
> -static void terminate_walk(struct nameidata *nd)
> -{
> - if (!(nd->flags & LOOKUP_RCU)) {
> - path_put(&nd->path);
> - } else {
> - nd->flags &= ~LOOKUP_RCU;
> - if (!(nd->flags & LOOKUP_ROOT))
> - nd->root.mnt = NULL;
> - rcu_read_unlock();
> - }
> - while (unlikely(nd->depth))
> - put_link(nd);
> -}
> -
> static int pick_link(struct nameidata *nd, struct path *link,
> struct inode *inode, unsigned seq)
> {
> @@ -1561,13 +1623,12 @@ static int pick_link(struct nameidata *nd, struct path *link,
> return -ELOOP;
> }
> if (nd->flags & LOOKUP_RCU) {
> - if (unlikely(nd->path.mnt != link->mnt ||
> - unlazy_walk(nd, link->dentry, seq))) {
> + if (unlikely(unlazy_link(nd, link, seq)))
> return -ECHILD;
> - }
> + } else {
> + if (link->mnt == nd->path.mnt)
> + mntget(link->mnt);
> }
> - if (link->mnt == nd->path.mnt)
> - mntget(link->mnt);
> error = nd_alloc_stack(nd);
> if (unlikely(error)) {
> path_put(link);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-05-15 05:11:07

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3 109/110] namei: handle absolute symlinks without dropping out of RCU mode

On Mon, May 11, 2015 at 07:08:09PM +0100, Al Viro wrote:
> @@ -499,7 +499,7 @@ struct nameidata {
> struct path root;
> struct inode *inode; /* path.dentry.d_inode */
> unsigned int flags;
> - unsigned seq, m_seq;
> + unsigned seq, m_seq, root_seq;
> int last_type;
> unsigned depth;
> int total_link_count;
> @@ -780,14 +780,14 @@ static __always_inline void set_root(struct nameidata *nd)
> static __always_inline unsigned set_root_rcu(struct nameidata *nd)
> {
> struct fs_struct *fs = current->fs;
> - unsigned seq, res;
> + unsigned seq;
>
> do {
> seq = read_seqcount_begin(&fs->seq);
> nd->root = fs->root;
> - res = __read_seqcount_begin(&nd->root.dentry->d_seq);
> + nd->root_seq = __read_seqcount_begin(&nd->root.dentry->d_seq);

nd->root_seq is also needed in LOOKUP_ROOT | LOOKUP_RCU case. Fixed and
folded.

2015-05-15 18:30:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 7:51 PM, Al Viro <[email protected]> wrote:
>
> What's the benefit compared to c-i mount? Not hitting filesystem's
> ->d_hash() and ->d_compare()?

So the reason I'd be interested in per-access flags rather than mount flags are:

- only special apps should use this anyway. IOW, samba and perhaps
things like wine, and that's absolutely it. The argument that it might
confuse "tar" is bogus, exactly because not only would tar never do
this in the first place, tar absolutely *mustn't* do crap like this
anyway. case-insensitive filesystems are insane, the *only* possible
valid reason for them is for "emulate insane systems".

- mount flags are bad. They'd be useless for something like wine
(where you want to make part of the users home directory be the
filesystem), and they are bad for things like samba too. Having to
make a whole filesystem case-insensitive is crazy, because
case-insensitivity is crazy. You want to make one application able to
use case-insensitivity, not make all accesses so.

- we do have cases where more per-access flags might be a really good
idea. The whole "don't follow any/absolute symlinks" and "don't follow
dotdot" are real concerns in various other places that now waste time
trying to do it manually (and generally do it badly at that - see all
the historical apache issues with dotdot to escape the publicly
visible areas). I think it's a mistake in general to think that these
kinds of things should be per-mount.

That said, if the main/only reason samba would use this is for the
case it would be bad at handling anyway (negative lookups in big
directories), it's definitely not worth it.

Linus

2015-05-15 21:15:39

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On May 14, 2015, at 5:23 AM, Dave Chinner <[email protected]> wrote:
>
> On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
>> On Wed, May 13, 2015 at 8:30 PM, Al Viro <[email protected]> wrote:
>>>
>>> Maybe... I'd like to see the profiles, TBH - especially getxattr() and
>>> access() frequency on various loads. Sure, make(1) and cc(1) really care
>>> about stat() very much, but I wouldn't be surprised if something like
>>> httpd or samba would be hitting getxattr() a lot...
>>
>> So I haven't seen samba profiles in ages, but iirc we have more
>> serious problems than trying to speed up basic filename lookup.
>>
>> At least long long ago, inode semaphore contention was a big deal,
>> largely due to readdir().
>
> It still is - it's the prime reason people still need to create
> hashed directory structures so that they can get concurrency in
> directory operations. IMO, concurrency in directory operations is a
> more important problem to solve than worrying about readdir speed;
> in large filesystems readdir and lookup are IO bound operations and
> so everything serialises on the IO as it's done with the i_mutex
> held....

We've had a patch[*] to add ext4 parallel directory operations in Lustre for
a few years, that adds separate locks for each internal tree and leaf block
instead of using i_mutex, so it scales as the size of the directory grows.
This definitely improved many-threaded directory create/lookup/unlink
performance (rename still uses a single lock).

Since it is used only directly on the server within the kernel there is no
VFS interface for it. The last time that I brought VFS integration up with
Al a year or two ago he thought the complexity was very high and not worth
the effort, but maybe with increasing thread counts on clients it is time
to look into it?

Cheers, Andreas

[*] For anyone interested, the latest version of the ext4 patch is at:
http://git.whamcloud.com/fs/lustre-release.git/blob/ac8a7f3d22ef007bcf62bfb4c6ed747cec6e874e:/ldiskfs/kernel_patches/patches/rhel7/ext4-pdirop.patch

>> And readdir() itself, for that matter - we have no good vfs-level
>> readdir caching, so it all ends up serialized on the inode
>> semaphore, and it all goes all the way into the filesystem to get
>> the readdir data. And at least for ext4, readdir()
>> is slow anyway, because it doesn't use the page cache, it uses
>> that good old buffer cache, because of how ext4 does metadata
>> journaling etc.
>
> IIRC, ext4 readdir is not slow because of the use of the buffer
> cache, it's slow because of the way it hashes dirents across blocks
> on disk. i.e. it has locality issues, not a caching problem.
>
>> Having readdir() caching at the VFS layer would likely be a really
>> good thing, but it's hard. It *might* be worth looking at the nfs4
>> code to see if we could possibly move some of that code into the vfs
>> layer, but the answer is likely "no", or at least "that's incredibly
>> painful".
>
> Maybe I'm missing something - what operation would be sped up by
> caching readdir data? Are you trying to optimise the ->lookup that
> tends to follow readdir by caching individual dirents? Or something
> else?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas




2015-05-15 23:30:39

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, 15 May 2015 15:15:48 -0600 Andreas Dilger <[email protected]> wrote:

> On May 14, 2015, at 5:23 AM, Dave Chinner <[email protected]> wrote:
> >
> > On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> >> On Wed, May 13, 2015 at 8:30 PM, Al Viro <[email protected]> wrote:
> >>>
> >>> Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> >>> access() frequency on various loads. Sure, make(1) and cc(1) really care
> >>> about stat() very much, but I wouldn't be surprised if something like
> >>> httpd or samba would be hitting getxattr() a lot...
> >>
> >> So I haven't seen samba profiles in ages, but iirc we have more
> >> serious problems than trying to speed up basic filename lookup.
> >>
> >> At least long long ago, inode semaphore contention was a big deal,
> >> largely due to readdir().
> >
> > It still is - it's the prime reason people still need to create
> > hashed directory structures so that they can get concurrency in
> > directory operations. IMO, concurrency in directory operations is a
> > more important problem to solve than worrying about readdir speed;
> > in large filesystems readdir and lookup are IO bound operations and
> > so everything serialises on the IO as it's done with the i_mutex
> > held....
>
> We've had a patch[*] to add ext4 parallel directory operations in Lustre for
> a few years, that adds separate locks for each internal tree and leaf block
> instead of using i_mutex, so it scales as the size of the directory grows.
> This definitely improved many-threaded directory create/lookup/unlink
> performance (rename still uses a single lock).

.. and I've been wondering what to do about i_mutex and NFS. I've had
customer reports of slowness in creating files that seems to be due to
i_mutex on the directory being held over the whole 'create' RPC, so only one
of those can be in flight at the one time.
"make -j" on a large source directory can easily want to create lots of
"*.o" files at "the same time".

And NFS doesn't need i_mutex at all because the server will provide the
needed guarantees.

NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-15 23:38:23

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 08:51:12AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 4:23 AM, Dave Chinner <[email protected]> wrote:
> >
> > IIRC, ext4 readdir is not slow because of the use of the buffer
> > cache, it's slow because of the way it hashes dirents across blocks
> > on disk. i.e. it has locality issues, not a caching problem.
>
> No, you're just worrying about IO. Natural for a filesystem guy, but a
> lot of loads cache really well, and IO isn't an issue. Yes, there's a
> bad cold-cache case, but that's not when you get inode semaphore
> contention.

Right, because it's cold cache performance that everyone complains
about. e.g. Workloads like gluster, ceph, fileservers, openstack
(e.g. swift) etc are all mostly cold cache directory workloads with
*extremely high* concurrency. Nobody is complaining about cached
readdir performance - concurrency in cold cache directory operations
is what everyone has been asking me for.

In case you missed it, recently the Ceph developers have been
talking about storing file handles in a userspace database and then
using open_by_handle_at() so they can avoid the pain of cold cache
directory lookup overhead (see the O_NOMTIME thread). We have a
serious cold cache lookup problem on directories when people are
looking to bypass the directory structure entirely....

[snip a bunch of rhetoric lacking in technical merit]

> End result: readdir() wastes a *lot* of time on stupid stuff (just
> that physical block number lookup is generally more expensive than
> readdir itself should be), and it does so with excessive locking,
> serializing everything.

The most overhead in readdir is calling filldir over and over again
for every dirent to copy it into the user buffer. The overhead is
not from looking up the buffer in the cache.

So, I just created close to a million dirents in a directory, and
ran the xfs_io readdir command on it (look, a readdir performance
measurement tool!). I used a ram disk to take IO out of the picture
for the first read, the system has E5-4620 0 @ 2.20GHz CPUs, and I
dropped caches to ensure that there was no cached metadata:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ sudo xfs_io -c readdir /mnt/scratch
read 29545648 bytes from offset 0
28 MiB, 923327 ops, 0.0000 sec (111.011 MiB/sec and 3637694.9201 ops/sec)
$ sudo xfs_io -c readdir /mnt/scratch
read 29545648 bytes from offset 0
28 MiB, 923327 ops, 0.0000 sec (189.864 MiB/sec and 6221628.5056 ops/sec)
$ sudo xfs_io -c readdir /mnt/scratch
read 29545648 bytes from offset 0
28 MiB, 923327 ops, 0.0000 sec (190.156 MiB/sec and 6231201.6629 ops/sec)
$

Reading, decoding and copying dirents at 190MB/s? That's roughly 6
million dirents/second being pulled from cache, and it's doing
roughly 4 million/second cold cache. That's not slow at all.

What *noticable* performance gains are there to be had here for the
average user? Anything that takes less than a second or two to
complete is not going to be noticable to a user, and most people
don't have 8-10 million inodes in a directory....

So, what did the profile look like?

10.07% [kernel] [k] __xfs_dir3_data_check
9.92% [kernel] [k] copy_user_generic_string
7.44% [kernel] [k] xfs_dir_ino_validate
6.83% [kernel] [k] filldir
5.43% [kernel] [k] xfs_dir2_leaf_getdents
4.56% [kernel] [k] kallsyms_expand_symbol.constprop.1
4.38% [kernel] [k] _raw_spin_unlock_irqrestore
4.26% [kernel] [k] _raw_spin_unlock_irq
4.02% [kernel] [k] __memcpy
3.02% [kernel] [k] format_decode
2.36% [kernel] [k] xfs_dir2_data_entsize
2.28% [kernel] [k] vsnprintf
1.99% [kernel] [k] __do_softirq
1.93% [kernel] [k] xfs_dir2_data_get_ftype
1.88% [kernel] [k] number.isra.14
1.84% [kernel] [k] _xfs_buf_find
1.82% [kernel] [k] ___might_sleep
1.61% [kernel] [k] strnlen
1.49% [kernel] [k] queue_work_on
1.48% [kernel] [k] string.isra.4
1.21% [kernel] [k] __might_sleep


Oh, I'm running CONFIG_XFS_DEBUG=y, so internal runtime consistency
checks consume most of the CPU (__xfs_dir3_data_check,
xfs_dir_ino_validate). IOWs, real world readdir performance will be
much, much faster than I've demonstrated.

Other than that, the most CPU is spent on copying dirents into the
user buffer (copy_user_generic_string), passing dirents to the user
buffer (filldir) and extracting dirents from the on-disk buffer
(xfs_dir2_leaf_getdents). The we have lock contention, ramdisk IO
(memcpy), some vsnprintf stuff (includes format_decode, probably
debug code) and some more dirent information extraction functions.

it's not until we get to _xfs_buf_find() do we see a buffer cache
lookup function, and that's actually comsuming less CPU than the
__might_sleep/____might_sleep() debug annotations. That puts it in
persepective just how little overhead readdir buffer caching
actually has compared to everything else.

IOWs, these numbers indicate that readdir caching overhead has no
real impact on the performance of hot cache readdir operations.

So, back to the question I asked that you didn't answer: exactly
what are you proposing to cache in the VFS readdir cache? Without
knowing that, I can't make any sane comment on about technical merit
of your proposal....

> Both readdir() and path component lookup are technically read
> operations, so why the hell do we use a mutex, rather than just
> get a read-write lock for reading? Yeah, it's that (d) above. I
> might trust xfs and ext4 to get their internal exclusions for
> allocations etc right when called concurrently for the same
> directory. But the others?

They just use a write lock for everything and *nothing changes* -
this is a simple problem to solve.

The argument "filesystem developers are stupid" is not a
compelling argument against changing locking. You're just being
insulting, even though you probably don't realise it.

[snip more rhetoric about the page cache being the only solution]

> Basically, in computer science, pretty much all performance work
> is about caching. And readdir is the one area where the VFS layer
> doesn't do well, falls on its face and punts back to the
> filesystem.

Caching is used to hide the problems of the lower layers. If the
lower layers don't have a problem, then another layer of caching is
not necessary.

Linus, what you haven't put together is a clear statement of the
problem another layer of readdir caching is going to solve. What
workload is having problems? Where are the profiles demonstrating
that readdir caching is the issue, or the solution to the issue you
are seeing? We know about plenty of workloads where directory access
concurrency is a real problem, but I'm not seeing the problem you
are trying to address...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-05-15 23:39:20

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 03:15:48PM -0600, Andreas Dilger wrote:
> On May 14, 2015, at 5:23 AM, Dave Chinner <[email protected]> wrote:
> >
> > On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> >> On Wed, May 13, 2015 at 8:30 PM, Al Viro <[email protected]> wrote:
> >>>
> >>> Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> >>> access() frequency on various loads. Sure, make(1) and cc(1) really care
> >>> about stat() very much, but I wouldn't be surprised if something like
> >>> httpd or samba would be hitting getxattr() a lot...
> >>
> >> So I haven't seen samba profiles in ages, but iirc we have more
> >> serious problems than trying to speed up basic filename lookup.
> >>
> >> At least long long ago, inode semaphore contention was a big deal,
> >> largely due to readdir().
> >
> > It still is - it's the prime reason people still need to create
> > hashed directory structures so that they can get concurrency in
> > directory operations. IMO, concurrency in directory operations is a
> > more important problem to solve than worrying about readdir speed;
> > in large filesystems readdir and lookup are IO bound operations and
> > so everything serialises on the IO as it's done with the i_mutex
> > held....
>
> We've had a patch[*] to add ext4 parallel directory operations in Lustre for
> a few years, that adds separate locks for each internal tree and leaf block
> instead of using i_mutex, so it scales as the size of the directory grows.
> This definitely improved many-threaded directory create/lookup/unlink
> performance (rename still uses a single lock).

Yup, we can do the same to XFS to implement concurrent modifications.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-05-15 23:43:44

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Thu, May 14, 2015 at 04:57:22PM -0700, Jeremy Allison wrote:
> On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
> > On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison <[email protected]> wrote:
> > >
> > > Of course we tell people to just set their filesystems
> > > up using mkfs.xfs -n version=ci :-).
> >
> > So ASCII-only case-insensitivity is sufficient for you guys?
>
> No it's not enough really. But for specific Windows apps that
> use restricted namespaces (and there are such) it works.
>
> ZFS on *BSD does do full case-insenitive lookups (utf8) as part of
> FreeNAS. I think if it's configured a SMB-only share they turn
> that on.

Ad some people are using this on Linux:

http://oss.sgi.com/archives/xfs/2014-09/msg00169.html

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-05-15 23:49:43

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 09:30:22AM +1000, NeilBrown wrote:

> .. and I've been wondering what to do about i_mutex and NFS. I've had
> customer reports of slowness in creating files that seems to be due to
> i_mutex on the directory being held over the whole 'create' RPC, so only one
> of those can be in flight at the one time.
> "make -j" on a large source directory can easily want to create lots of
> "*.o" files at "the same time".
>
> And NFS doesn't need i_mutex at all because the server will provide the
> needed guarantees.

Directory i_mutex is used for a lot more than serialization of fs methods
on said directory - a lot of dcache handling relies upon having it held
around adding dentries/moving them around/making them negative/etc.

Server can't do a damn thing about those, obviously. Neither can it
do anything about multiple lookups on the same name in the same directory,
just because several processes have arrived at the same time with dcache
cold. And no, caching dentry before the end of lookup isn't a good idea
either - you'll get tons of messy corner cases.

If anyone has a usable finer-grained locking scheme, I would _love_ to see it.
All I'd seen from Lustre folks in that area was bringing Stross' mythos to
mind - both as in "reduction of NP to P would be handy for analysis" and
"looking into that thing feels like an excellent way of inviting the gibbering
monstrosities from beyond the spacetime to chew on your cortex".

2015-05-16 00:10:31

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 09:38:08AM +1000, Dave Chinner wrote:

> > Both readdir() and path component lookup are technically read
> > operations, so why the hell do we use a mutex, rather than just
> > get a read-write lock for reading? Yeah, it's that (d) above. I
> > might trust xfs and ext4 to get their internal exclusions for
> > allocations etc right when called concurrently for the same
> > directory. But the others?
>
> They just use a write lock for everything and *nothing changes* -
> this is a simple problem to solve.
>
> The argument "filesystem developers are stupid" is not a
> compelling argument against changing locking. You're just being
> insulting, even though you probably don't realise it.

Er... Remember the clusterfuck around the ->i_size and alignment
checks on XFS DIO writes? Just this cycle. Correctness of XFS
locking is nothing to boast about - it *is* convoluted as hell and you
guys are not superhuman enough to reliably spot the problems in that nest
of horrors. Nobody is.

PS: I've no idea whether I'm being insulting or not and frankly, I don't give
a damn; unlike Linus I hadn't signed off on the "code of conflict" nonsense.
Anyone who feels like complaining is quite welcome to it.

2015-05-16 00:20:10

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 01:10:27AM +0100, Al Viro wrote:

> Er... Remember the clusterfuck around the ->i_size and alignment
> checks on XFS DIO writes? Just this cycle. Correctness of XFS
> locking is nothing to boast about - it *is* convoluted as hell and you
> guys are not superhuman enough to reliably spot the problems in that nest
> of horrors. Nobody is.
>
> PS: I've no idea whether I'm being insulting or not and frankly, I don't give
> a damn; unlike Linus I hadn't signed off on the "code of conflict" nonsense.
> Anyone who feels like complaining is quite welcome to it.

PPS: everyone is also quite welcome to coming up with a sane locking scheme,
and unlike complaints about insulting tone, _that_ won't be ignored. Folks,
I'm absolutely serious - if anyone has a good candidate (hell, any candidate),
post it on fsdevel and let's discuss it. I'll do my best to poke holes in
such and if we end up with something working, I'll be bloody happy.

2015-05-16 00:46:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 4:30 PM, NeilBrown <[email protected]> wrote:
>
> .. and I've been wondering what to do about i_mutex and NFS. I've had
> customer reports of slowness in creating files that seems to be due to
> i_mutex on the directory being held over the whole 'create' RPC, so only one
> of those can be in flight at the one time.
> "make -j" on a large source directory can easily want to create lots of
> "*.o" files at "the same time".
>
> And NFS doesn't need i_mutex at all because the server will provide the
> needed guarantees.

So i_mutex on a directory is probably the nastiest lock we have in the fs layer.

It's used for several different half-related things:

- serialize filename creation/deletion

This is partly for the benefit of the filesystem itself (and not
helpful for NFS, as you note), but it's also very much about making
sure we have uniqueness guarantees at the VFS layer too.

So even with NFS, it's not just "the server provides the needed
guarantees", because some of the guarantees are really client-local.

For example, simply that we only ever have one single dentry for a
particular name, and that we only ever have one active lookup per
dentry. Those things happen independently of - and before - the server
even sees the operation.

So the whole local directory tree consistency ends up depending on this.

- readdir(). This is mostly to make it hard for filesystems to do the
wrong thing when there is concurrent file creation.

I suspect readdir could fairly easily push the i_mutex down from the
caller and into the filesystem, and then filesystems might narrow down
the use (or even get rid of it). The initial patch might even be
automated with coccinelle. However, rather few loads actually have a
lot of readdir() activity, and samba is probably the only major one.
I've seen benchmarks where it matters, but they are rare (and I
haven't seen one in literally years).

So the readdir case could probably be at least relaxed fairly easily.
But the thing that tends to hurt on more loads is, as you note, the
filename lookup/creation/movement case. And that's much harder to fix.

Al, do you have any ideas? Personally, I've wanted to make I_mutex a
rwsem for a long time, but right now pretty much everything uses it
for exclusion. For example, filename lookup is clearly just reading
the directory, so it should take a rwsem for reading, right? No. Not
the way it is done now. Filename lookup wants the directory inode
exclusively because that guarantees that we create just one dentry and
call the filesystem ->lookup only once on that dentry.

Again, there tend to be no simple benchmarks or loads that people care
about that show this. Most of the time it's fairly hard to see.

Linus

2015-05-16 01:23:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 4:38 PM, Dave Chinner <[email protected]> wrote:
>
> Right, because it's cold cache performance that everyone complains
> about.

People really do complain about the hot-cache one too.

Did you read the description of the sample benchmark that Jeremy
described Windows sales people for using?

That kind of thing is actually not that unusual, and they can be big
sales tools.

We went through similar things with "mindcraft", then netbench/dbench.
People will run those benchmarks with enough memory (and often tune
things like dirty thresholds etc) explicitly to get rid of the IO
component for benchmarking reasons.

And often they are just nasty marketing benchmarks and not very
meaningful. The "geekbench of filesystem testing", if you will. Fair
enough. But those kinds of things have also been very useful in making
performance better, because the "real" filesystem benchmarks are
usually too nasty to actually run on reasonable machines. So the
fake/bad ones are often good at showing things that don't scale well
(despite being 100% just CPU-bound) because they show some bottleneck.

And sometimes fixing that bottleneck for the non-IO case ends up
helping the IO case too.

So the one samba profile I remember seeing was probably from early
dbench, I'm pretty sure it was Tridge that showed it as a stress-case
for samba on Linux. So we're talking a decade ago, I really can't
claim I remember the details, but I do remember it being readdir()
being 100% CPU-bound. Or rather, it *would* have been 100% CPU-bound,
but due to the inode semaphore (and back then it was i_sem, I think,
now it's i_mutex) it was actually spending most of the time
sleeping/scheduling due to inode semaphore contention. So rather than
scaling perfectly with CPU's, it just took basically one CPU.

Now, samba has probably changed enormously, and maybe it's not a big
deal. But I don't think our filesystem locking has changed at all,
because quite frankly, nobody else seems to see it. It tends to be a
fileserving thing (the Lustre comment kind of feeds into that).

So it might be interesting to have a simple benchmark that people can
run. WITHOUT the IO load. Because really, IO isn't that interesting to
most of us, especially when we then don't even have IO subsystems that
do much parallelism..

I wrote my own (really really stupid) concurrent stat() test just to
get good profiles of where the real problems are. It's nasty - it's
literally just MAX_THREADS pthread that loop on doing stat() on a list
of files for ten seconds, and then it reports the total number of
loops. But that stupid thing was actually ridiculously useful, not
because the load is meaningful, but because it ended up showing that
we had horribly fragile behavior when we had contention on the dentry
lock.

(That got fixed, although it still ends up sucking when we fall out of
RCU mode - but with Al's upcoming patches that should hopefully be
really really unusual rather than "every time we see a symlink" etc)

Linus

2015-05-16 01:25:18

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, 15 May 2015 17:45:56 -0700 Linus Torvalds
<[email protected]> wrote:

> On Fri, May 15, 2015 at 4:30 PM, NeilBrown <[email protected]> wrote:
> >
> > .. and I've been wondering what to do about i_mutex and NFS. I've had
> > customer reports of slowness in creating files that seems to be due to
> > i_mutex on the directory being held over the whole 'create' RPC, so only one
> > of those can be in flight at the one time.
> > "make -j" on a large source directory can easily want to create lots of
> > "*.o" files at "the same time".
> >
> > And NFS doesn't need i_mutex at all because the server will provide the
> > needed guarantees.
>
> So i_mutex on a directory is probably the nastiest lock we have in the fs layer.
>
> It's used for several different half-related things:
>
> - serialize filename creation/deletion
>
> This is partly for the benefit of the filesystem itself (and not
> helpful for NFS, as you note), but it's also very much about making
> sure we have uniqueness guarantees at the VFS layer too.
>
> So even with NFS, it's not just "the server provides the needed
> guarantees", because some of the guarantees are really client-local.
>
> For example, simply that we only ever have one single dentry for a
> particular name, and that we only ever have one active lookup per
> dentry. Those things happen independently of - and before - the server
> even sees the operation.

But surely those things can be managed with a spinlock.

I think a big part of the problem is that the VFS tries to control
filesystems rather than provide services to them.
If NFS was in control it might:
- ask the dcache to look up a name and get back a dentry: positive, negative
or not-validated.
- if positive, NFS returns an error, or uses the inode - depending on
operation.
- otherwise send a 'create' request to the server. At this point it holds
references to dentries for directory and target, but no locks.
- If an error is returned, just drop references and return.
- On successful create, turn the filehandle into an inode and then
ask the dcache to link the inode with the target dentry.
If the dentry is still negative or not-validated, this is trivial.
If it is positive and already has the right inode, again trivial.
If it has the wrong inode, you certainly have an interesting problem,
but one that is specific to NFS (or similar filesystems) and one
that is up to NFS to solve, not up to the VFS to avoid.
If the syscall doesn't need to return an 'fd', then just drop the
references and report success.
If an 'fd' is required, then create a 'deleted' dentry attached to
the original parent. As 'mkdir' doesn't return an fd, that should
be safe (not that all the fuss about directory dentries having exactly one
parent is particularly relevant to NFS)

I'm not convinced that serialising 'lookup' calls is vital. If two threads
find a 'not-validated' dentry, and both try to look up the inode, they
will both ultimately get the same struct_inode from the icache, and will both
succeed in connecting it to the dentry. Obviously it would be better to
avoid two concurrent NFS "LOOKUP" requests, but that is a problem for NFS to
solve. I suspect that using d_fsdata to point to a pending LOOKUP request
would allow the "second" thread to wait for that request to finish. Other
filesystems would take a completely different approach.

But with the VFS trying to be in control and trying to accommodate the needs
of wildly different filesystems, I imagine it might not be so easy.

>
> So the whole local directory tree consistency ends up depending on this.
>
> - readdir(). This is mostly to make it hard for filesystems to do the
> wrong thing when there is concurrent file creation.

and makes it impossible for them to do the right thing too?


>
> I suspect readdir could fairly easily push the i_mutex down from the
> caller and into the filesystem, and then filesystems might narrow down
> the use (or even get rid of it). The initial patch might even be
> automated with coccinelle. However, rather few loads actually have a
> lot of readdir() activity, and samba is probably the only major one.
> I've seen benchmarks where it matters, but they are rare (and I
> haven't seen one in literally years).
>
> So the readdir case could probably be at least relaxed fairly easily.
> But the thing that tends to hurt on more loads is, as you note, the
> filename lookup/creation/movement case. And that's much harder to fix.
>
> Al, do you have any ideas? Personally, I've wanted to make I_mutex a
> rwsem for a long time, but right now pretty much everything uses it
> for exclusion. For example, filename lookup is clearly just reading
> the directory, so it should take a rwsem for reading, right? No. Not
> the way it is done now. Filename lookup wants the directory inode
> exclusively because that guarantees that we create just one dentry and
> call the filesystem ->lookup only once on that dentry.
>
> Again, there tend to be no simple benchmarks or loads that people care
> about that show this. Most of the time it's fairly hard to see.
>

Which "this"? Readdir? I haven't come across one (NFS READDIR issues are
all about when to use READDIR_PLUS and when not to).
Or create-contention-on-i_mutex? The load my customer had was 'make -j60' on
a many-cpu machine. I'm not sure if all the .o files were in the one
directory or if some logging was being recorded to multiple files in that one
directory, but the serialization of file-creation over NFS was measurable.
It wasn't horribly bad, but it was noticeably suboptimal. I think I
recommended creating whatever-it-was in multiple directories.

NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-16 01:38:59

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 05:45:56PM -0700, Linus Torvalds wrote:

> Al, do you have any ideas? Personally, I've wanted to make I_mutex a
> rwsem for a long time, but right now pretty much everything uses it
> for exclusion. For example, filename lookup is clearly just reading
> the directory, so it should take a rwsem for reading, right? No. Not
> the way it is done now. Filename lookup wants the directory inode
> exclusively because that guarantees that we create just one dentry and
> call the filesystem ->lookup only once on that dentry.

rwsem by itself won't do us much good there. Look: for multiple lookups on
the same existing entry we could try to teach d_splice_alias() to cope,
etc. But what happens when a bunch of processes looks for the same
inexistent entry? And no, "who cares about fuckloads of negatives with
the same name" isn't a good answer - suppose we do mkdir() after that.
OK, so we'll find a negative dentry in dcache. And tell the filesystem
to create the sucker. Done. Made it positive. Now, do we hunt down
all _other_ negative dentries for it? Or never keep negative ones at
all. Or slap some kind of ->d_revalidate() there to catch all negative
dentries creates before the last mkdir/creat/mknod/symlink/link in given
parent?

One possibility would be a new dentry state - "being looked up". Hashed,
treated as "fall out of RCU mode" for lazy pathwalk purposes, and places
where we call ->lookup() would (while still holding ->i_mutex on parent
shared) wait for that state to end. Places where we call ->d_revalidate()
(with or without ->i_mutex on parent) would also wait on those.

It would need a careful analysis of tree-walkers, though. Doable, but there
might be dragons. In case of e.g. ceph - swamp ones, with mirror in the line
of sight...

2015-05-16 01:47:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 6:25 PM, NeilBrown <[email protected]> wrote:
>>
>> For example, simply that we only ever have one single dentry for a
>> particular name, and that we only ever have one active lookup per
>> dentry. Those things happen independently of - and before - the server
>> even sees the operation.
>
> But surely those things can be managed with a spinlock.

Some of them could. But no, not in general. Exactly because of that
"only one lookup of this particular dentry". The lookup sleeps, so
when another thread tries to look up the same dentry, it needs to hit
a sleeping lock. And since at that point we don't have the new dentry
yet (much less the inode we're looking up), it is in the only place we
*do* have: the inode of the directory we're looking up.

Now, maybe we could solve it with a new sleeping lock in the dentry
itself. Maybe we could allocate the new dentry early, add it to the
directory the usual way, but mark it as being "not ready" (so that
d_lookup() wouldn't use it). And have the sleeping lock be a new
sleeping lock in the dentry.

But that would be a very big change. And I suspect Al has tons of
reasons why it would have various problems. We do rely on the i_mutex
a fair amount..

> I think a big part of the problem is that the VFS tries to control
> filesystems rather than provide services to them.

Well, most of the time we do aim to provice services (eg the page
cache generally works that way).

But the dentry layer really is pretty important. And it's a *huge*
performance win. Yes, yes, you don't see it when it works, you only
see the bad cases, and there is no way in hell we can let the
low-level filesystem control the dentry tree.

Also, remember: we do have something like 65 different filesystems.
It's a *big* deal to make them all work "well enough". Making the
locking rules unambiguous is pretty primary.

Making one particular filesystem perform well is important to _you_,
but in the big picture...

>> - readdir(). This is mostly to make it hard for filesystems to do the
>> wrong thing when there is concurrent file creation.
>
> and makes it impossible for them to do the right thing too?

Well, possibly. As mentioned, the readdir case might actually be
pretty trivial, but it just hasn't come up often.

I've literally only ever seen it for samba. I'd take a tested patch. Hint, hint.

>> Again, there tend to be no simple benchmarks or loads that people care
>> about that show this. Most of the time it's fairly hard to see.
>
> Which "this"? Readdir? I haven't come across one (NFS READDIR issues are
> all about when to use READDIR_PLUS and when not to).
> Or create-contention-on-i_mutex? The load my customer had was 'make -j60' on
> a many-cpu machine.

Both/either. The problem is that the "make -j60" case seems to depend
on the filesystem having bad latency - it certainly doesn't show up on
a local filesystem. Maybe that could be simulated some way..

Linus

2015-05-16 01:48:26

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 11:25:03AM +1000, NeilBrown wrote:
> But surely those things can be managed with a spinlock.
>
> I think a big part of the problem is that the VFS tries to control
> filesystems rather than provide services to them.

What with being the thing syscalls talk to for sending the requests to
filesystems... Do you really want to push the pathname resolution into
fs code? You've looked at it lately, right?

> I'm not convinced that serialising 'lookup' calls is vital. If two threads
> find a 'not-validated' dentry, and both try to look up the inode, they
> will both ultimately get the same struct_inode from the icache, and will both
> succeed in connecting it to the dentry. Obviously it would be better to
> avoid two concurrent NFS "LOOKUP" requests, but that is a problem for NFS to
> solve. I suspect that using d_fsdata to point to a pending LOOKUP request
> would allow the "second" thread to wait for that request to finish. Other
> filesystems would take a completely different approach.

See upthread regarding multiple negative dentries with the same name and fun
consequences thereof. There might be _NO_ inode. At all. dcache has a large
negative component and without it you'd get really fucked on NFS as soon
as you try to compile anything. Shitloads of headers, looked up in a lot of
directories. Most of the lookups ending up negative. We really do need that
stuff...

2015-05-16 01:55:46

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 06:47:04PM -0700, Linus Torvalds wrote:

> Now, maybe we could solve it with a new sleeping lock in the dentry
> itself. Maybe we could allocate the new dentry early, add it to the
> directory the usual way, but mark it as being "not ready" (so that
> d_lookup() wouldn't use it). And have the sleeping lock be a new
> sleeping lock in the dentry.

See upthread. It might be doable (provided that we turn ->i_mutex into
rwsem, to keep the exclusion with directory _modifiers_), but it'll need
a really non-trivial code review of a bunch of filesystems, especially ones
that want to play with the list of children like ceph does. And things
like sillyrename and dcache-populating readdir instances, albeit not as scary
as ceph. And then there's lustre...

2015-05-16 02:23:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 6:55 PM, Al Viro <[email protected]> wrote:
>
> See upthread. It might be doable (provided that we turn ->i_mutex into
> rwsem, to keep the exclusion with directory _modifiers_), but it'll need
> a really non-trivial code review of a bunch of filesystems, especially ones
> that want to play with the list of children like ceph does. And things
> like sillyrename and dcache-populating readdir instances, albeit not as scary
> as ceph. And then there's lustre...

Yup.

I don't think it's viable if we can't do it gradually, and leave
filesystems with the option to basically keep the existing locking.
Because most won't care that deeply anyway, and some have
complications like ceph.

But we might be able to do *some* changes that wouldn't be that
noticeable. For example, something like

- phase 1:

Turn i_mutex into an rwsem, change all users to take it for writing

This part should be pretty much a semantic no-op.

- phase 2:

For filesystems that say that they are ok with, make lookup_slow()
(and *only* lookup_slow for now) instead take the rwsem for reading,
but in addition to that, take a hashed mutex.

By "hashed mutex", I mean having a smallish table of mutexes (say,
1024), and just creating a hash based on the name-hash and the parent
pointer. That way we can avoid all the issues with adding a new lock
to the dentry itself, or having to allocate a new child dentry just
for the lock. It *could* cause some cross-directory serialization due
to hash collisions, but that shouldn't be noticeable if the hash is of
a reasonable size and quality.

That would allow lookups (and _only_ lookups) to happen in parallel,
but the hashed mutex would mean that you'd serialize the "same name in
same directory" case. And we'd require filesystems to say "I can
support this concurrent lookup model".

There might be a "phase 3" and so on where we could expand this to
slightly more than just lookup_slow(), but I suspect that even doing
it *just* there would already catch the bulk of issues. And requiring
filesystems to sign up for it means that we can ignore any ugly cases.

I dunno. The above _sounds_ fairly safe and easy because of how it
limits the impact. But I might be missing something.

Linus

2015-05-16 03:16:55

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 07:23:11PM -0700, Linus Torvalds wrote:

> For filesystems that say that they are ok with, make lookup_slow()
> (and *only* lookup_slow for now) instead take the rwsem for reading,
> but in addition to that, take a hashed mutex.
>
> By "hashed mutex", I mean having a smallish table of mutexes (say,
> 1024), and just creating a hash based on the name-hash and the parent
> pointer. That way we can avoid all the issues with adding a new lock
> to the dentry itself, or having to allocate a new child dentry just
> for the lock. It *could* cause some cross-directory serialization due
> to hash collisions, but that shouldn't be noticeable if the hash is of
> a reasonable size and quality.

What for? All we need is a flag, waitqueue and being woken
up when the flag gets cleared. So let's just use the queue of parent's
->i_mutex and explicitly kick it when removing dentry flag. We *are*
holding a reference on parent (we need that to hold that sucker shared,
after all), so it's not going away under us...

I'm all for gradual transformations, but in this case I suspect
that doing it on per-fs basis isn't the best way to do it; gradual massage
of code using dcache lookups or walking the lists of children in filesystems
(fortunately, it's fairly rare these days, and we only need to care about
the code checking if such a beast is hashed; d_alloc() already places new
dentry on the list of children) would seem to be a better approach. We'd
also need to audit fs/dcache.c tree-walking-related code itself, but that's
much more limited.

2015-05-16 04:33:00

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 08:37:20PM -0700, Linus Torvalds wrote:
> On May 15, 2015 8:17 PM, "Al Viro" <[email protected]> wrote:
> >
> > What for? All we need is a flag, waitqueue and being woken
> > up when the flag gets cleared.
>
> You need to have the flag somewhere.
>
> The child dentry doesn't exist yet.
>
> That's the point of the hashed entry. It approximates the not-yet-existing
> child dentry that we have *not* added to the parent until after lookup.

Point, but... A lot of our problems comes from the fact that ->i_mutex
doubles as protection against the addition to the list of children, on
top of protection of directory itself. What if we do the following:
have the normal case of __lookup_hash() (and other callers of lookup_real())
* allocate dentry, marked "in-lookup"
* do dcache lookup, likely to come up empty, _without_ touching
potential matches' d_lock, i.e. based on __d_lookup_rcu() (under
rcu_read_lock(), with rename_lock loop around it). Hold parent's ->d_lock
while walking the chain, grab refcount in the unlikely case the match had
been found. If nothing's found *and* rename_lock hadn't been touched, insert
the new dentry into hash and list of children before dropping ->d_lock.
* call ->lookup() (still under ->i_mutex, shared)
* clear "in-lookup" bit on _original_ dentry (we might very well
have returned a different one)
* kick the wait queue of parent's ->i_mutex

I'll need to think about that after I get some sleep, but it smells like
that could be feasible. Of course, that assumes we'll be able to cope
with hashed-but-currently-in-lookup dentries, but I think it might be
doable with some massage...

2015-05-16 04:45:40

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, 16 May 2015 02:47:18 +0100 Al Viro <[email protected]> wrote:

> On Sat, May 16, 2015 at 11:25:03AM +1000, NeilBrown wrote:
> > But surely those things can be managed with a spinlock.
> >
> > I think a big part of the problem is that the VFS tries to control
> > filesystems rather than provide services to them.
>
> What with being the thing syscalls talk to for sending the requests to
> filesystems... Do you really want to push the pathname resolution into
> fs code? You've looked at it lately, right?

Yes, I've looked lately :-)
I think that all of RCU-walk, and probably some of REF-walk should happen
before the filesystem gets to see anything.
But once you hit a non-positive dentry or the parent of the target name, I'd
rather hand over the the FS.

NFSv4 has the ability to look up multiple components in a single LOOKUP call.
VFS doesn't give it a chance to try because it wants to go step-by-step, and
wants each entry in the cache to have an inode etc.

The earlier the filesystem gets control, the less completely-general the VFS
needs to be.

>
> > I'm not convinced that serialising 'lookup' calls is vital. If two threads
> > find a 'not-validated' dentry, and both try to look up the inode, they
> > will both ultimately get the same struct_inode from the icache, and will both
> > succeed in connecting it to the dentry. Obviously it would be better to
> > avoid two concurrent NFS "LOOKUP" requests, but that is a problem for NFS to
> > solve. I suspect that using d_fsdata to point to a pending LOOKUP request
> > would allow the "second" thread to wait for that request to finish. Other
> > filesystems would take a completely different approach.
>
> See upthread regarding multiple negative dentries with the same name and fun
> consequences thereof. There might be _NO_ inode. At all. dcache has a large
> negative component and without it you'd get really fucked on NFS as soon
> as you try to compile anything. Shitloads of headers, looked up in a lot of
> directories. Most of the lookups ending up negative. We really do need that
> stuff...

Of course negative dentries are important and having multiple would be
unfortunate. I don't suggest that for a moment.
I'm suggesting three different states for a dentry: positive, negative, don't
know. "don't know" is a new state that isn't currently allowed.

While a filesystem is performing 'lookup', doing its own locking or not, the
dentry would be "don't know". Anything that needed to know would block
somewhere in the filesystem code on whatever lock or waitqueue or whatever
that the filesystem developer felt as appropriate. On i_mutex if
generic_foo() was in use.

If NFSv4 did a multi-component lookup, the intermediate dentries would be
"don't know" even while they had children. For local filesystems, that sort
of thing would never happen. For NFS - which has to allow for random changes
on the server anyway - it is just part of the game.

NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-16 05:46:33

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 02:45:27PM +1000, NeilBrown wrote:

> Yes, I've looked lately :-)
> I think that all of RCU-walk, and probably some of REF-walk should happen
> before the filesystem gets to see anything.
> But once you hit a non-positive dentry or the parent of the target name, I'd
> rather hand over the the FS.

... and be ready to get it back when the sucker runs into a symlink. Unless
you want to handle _those_ in NFS somehow (including an absolute one starting
with /sys/, etc.).

> NFSv4 has the ability to look up multiple components in a single LOOKUP call.
> VFS doesn't give it a chance to try because it wants to go step-by-step, and
> wants each entry in the cache to have an inode etc.

Do tell, how do we deal with .. afterwards if we leave the intermediate ones
without inodes? We _could_ feed multi-component requests to filesystems
(and NFSv4 isn't the first one to handle that - 9p had been there a lot
earlier), but then you get to
* populate all of them with inodes
* be damn careful to avoid multiple dentries for the same directory
inode
Look, creating those suckers isn't the worst part; you need to be ready for
e.g. mount(2) or pathname resolution playing with the ones you'd created.
It's not fs-private data structure; pathname resolution might very well span
many filesystem types.

Worse, you get to deal with several multi-component requests jumping into
fs at the same place. With responses arriving a bit afterwards, and guess
what? Those requests happen to share bits and pieces of prefixes. Oh,
and one of them is a rename. Dealing with just the final components isn't
a problem; you'll need to deal with directory tree in all its fscking glory.
In a way that wouldn't be in too incestous relationship with the pathwalking
logics in VFS and, by that proxy, such in all other fs types.

In particular, "unknown" for intermediate nodes is a recipe for really
nasty mess. If the path can rejoin the known universe several components
later... <shudder>

Dealing with multi-component lookups isn't impossible and might be a good
idea, but only if all intermediates are populated. What information does
NFSv4 multi-component lookup give you? 9p one gives an array of FIDs,
one per component, and that is best used as multi-component revalidate
on hot dcache...

2015-05-16 14:19:20

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 06:46:26AM +0100, Al Viro wrote:

> Dealing with multi-component lookups isn't impossible and might be a good
> idea, but only if all intermediates are populated. What information does
> NFSv4 multi-component lookup give you? 9p one gives an array of FIDs,
> one per component, and that is best used as multi-component revalidate
> on hot dcache...

Having reread the RFC... What's the problem with intermediates?
Just put GETFH and GETATTR between the LOOKUP for each component in
the same compound and be done with that - you've got yourself everything
you might possibly need for populating them. Confused...

BTW, I would still very much prefer to allocate a chain of
dentries in fs/namei.c (yes, marking them "in-lookup"), then gave an
array of pointers (or beginning and end of the chain, but that can
be more delicate due to dentry tree topology changes from e.g.
d_materialize_unique(), aka d_splice_alias() these days). With
the requirement being "populate them in root-to-leaves order, do nothing
for ones that had in-lookup flag already cleared".

Another fun possibility (but that would take somewhat more
restructuring in fs/namei.c) would be to have (on hot cache) a path
traced for several components, seeing that they are all on the same
fs and delaying revalidation for a while. With bulk revalidate covering
all the chain when we stumble across .., mountpoint or something we believe
to be a symlink, or when the chain reaches fs-specified limit. Said bulk
revalidate should tell how long a prefix had been OK. Permission change
handling would be the painful part here...

2015-05-16 19:36:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 9:31 PM, Al Viro <[email protected]> wrote:
=>
> Point, but... A lot of our problems comes from the fact that ->i_mutex
> doubles as protection against the addition to the list of children, on
> top of protection of directory itself.

Yeah, ok, we'd need to change that too. Maybe just make it use d_lock..

But yes, I like your alternative:

> What if we do the following:
> have the normal case of __lookup_hash() (and other callers of lookup_real())
> * allocate dentry, marked "in-lookup"
> * do dcache lookup, likely to come up empty, _without_ touching
> potential matches' d_lock, i.e. based on __d_lookup_rcu() (under
> rcu_read_lock(), with rename_lock loop around it). Hold parent's ->d_lock
> while walking the chain, grab refcount in the unlikely case the match had
> been found. If nothing's found *and* rename_lock hadn't been touched, insert
> the new dentry into hash and list of children before dropping ->d_lock.
> * call ->lookup() (still under ->i_mutex, shared)
> * clear "in-lookup" bit on _original_ dentry (we might very well
> have returned a different one)
> * kick the wait queue of parent's ->i_mutex

I agree, that should work too, and might be somewhat advantageous. And
we do have that extra dentry, since we pass it down (for the name) to
lookup anyway. We'd just hash it and have that magical state.

Anyway, just grepping for "i_mutex" made me almost cry.

So phase 1 should probably be to not even touch i_mutex, but just add
a new abstraction layer to get rid of the direct lock accesses. That
will make things easier down the line.

The attached patch is huge, but it's all automated, and shouldn't
change any semantics at all - except to make it much easier to change
the locking details later. What do you think?

There are still a lot of "i_mutex" references in comments (several of
them clearly just mindless search-and-replace from when it used to be
a semaphore, when we _didn't_ do this cleanup: look for "down"
mentions ;) and there's a few scattered actual uses for initialization
and for two cases of 'mutex_lock_killable()' that I didn't bother to
make a wrapper for etc. But this should make it much easier to change
things eventually if we want to (ie we could turn it all into a rwsem
with a rather small patch for testing)

Linus


Attachments:
patch.diff.gz (51.18 kB)

2015-05-17 03:03:53

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, 16 May 2015 06:46:26 +0100 Al Viro <[email protected]> wrote:

> On Sat, May 16, 2015 at 02:45:27PM +1000, NeilBrown wrote:
>
> > Yes, I've looked lately :-)
> > I think that all of RCU-walk, and probably some of REF-walk should happen
> > before the filesystem gets to see anything.
> > But once you hit a non-positive dentry or the parent of the target name, I'd
> > rather hand over the the FS.
>
> ... and be ready to get it back when the sucker runs into a symlink. Unless
> you want to handle _those_ in NFS somehow (including an absolute one starting
> with /sys/, etc.).

Certain - when a symlink or mountpoint is found, the filesystem stops.
mountpoints should be rarely hit because the path to a mountpoint will
usually be stable enough for RCU-walk to find it....
Thinks: I wonder what happens when a mount-on NFS directory is deleted on the
server...

automountpoints would be handled completely by the filesystem. It would
mount something and then return saying "Look, I found a mount point - you
wanna handle for me?".


>
> > NFSv4 has the ability to look up multiple components in a single LOOKUP call.
> > VFS doesn't give it a chance to try because it wants to go step-by-step, and
> > wants each entry in the cache to have an inode etc.
>
> Do tell, how do we deal with .. afterwards if we leave the intermediate ones
> without inodes? We _could_ feed multi-component requests to filesystems
> (and NFSv4 isn't the first one to handle that - 9p had been there a lot
> earlier), but then you get to
> * populate all of them with inodes
> * be damn careful to avoid multiple dentries for the same directory
> inode

NFS directories already need to be revalidated occasionally. Having a dentry
in "unknown" state just means a revalidation is that much more likely.

Suppose I cd into a directory, then rename the directory on the server. What
happens? What should happen?
I could make a case that the NFS client should lookup ".." on the server and
rebuild the path upwards.

There is a (to me) really key point here.
Local filesystems use the dcache for correctness. It prevents concurrent
directory renames from creating loops and it ensure that only one file of a
given name exists in each directory.

Remote filesystems don't use it for correctness. For them it is simply an
optimisation. So getting upset about directories with multiple dentries, or
directories that aren't connected to the root is very important for local
filesystems, and largely irrelevant for network filesystems.

A local filesystem needs the cache to remain consistent with storage. A
network filesystem cannot possible ensure that the cache is consistent with
storage, and just need to be able to notice the more offensive
inconsistencies reasonably quickly, and repair them.

> Look, creating those suckers isn't the worst part; you need to be ready for
> e.g. mount(2) or pathname resolution playing with the ones you'd created.
> It's not fs-private data structure; pathname resolution might very well span
> many filesystem types.

Any partname lookup which touched these dentries would call d_revalidate()
(or similar) which could get the inode etc if it was really needed.

>
> Worse, you get to deal with several multi-component requests jumping into
> fs at the same place. With responses arriving a bit afterwards, and guess
> what? Those requests happen to share bits and pieces of prefixes. Oh,
> and one of them is a rename. Dealing with just the final components isn't
> a problem; you'll need to deal with directory tree in all its fscking glory.
> In a way that wouldn't be in too incestous relationship with the pathwalking
> logics in VFS and, by that proxy, such in all other fs types.
>
> In particular, "unknown" for intermediate nodes is a recipe for really
> nasty mess. If the path can rejoin the known universe several components
> later... <shudder>
>
> Dealing with multi-component lookups isn't impossible and might be a good
> idea, but only if all intermediates are populated. What information does
> NFSv4 multi-component lookup give you? 9p one gives an array of FIDs,
> one per component, and that is best used as multi-component revalidate
> on hot dcache...

If was remembering RFC3010 in which a LOOKUP had a "pathname4" which was an
array of "component4". It could just return the filehandle and attributes of
the final target.
RFC3530 and later revised that so "LOOKUP" gets a "component4". That just
means that it is easy to get the attributes if you want them.

I'm not really saying that multiple component lookups are a good idea, or
that doing the lookup and not getting the intermediate attributes is a
sensible approach. What I'm really pointing out is that the current dcache
imposes a particular model very strongly on filesystems and I'm far from
convinced that that is a good idea.

NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-17 03:12:21

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, 16 May 2015 15:18:11 +0100 Al Viro <[email protected]> wrote:

> On Sat, May 16, 2015 at 06:46:26AM +0100, Al Viro wrote:
>
> > Dealing with multi-component lookups isn't impossible and might be a good
> > idea, but only if all intermediates are populated. What information does
> > NFSv4 multi-component lookup give you? 9p one gives an array of FIDs,
> > one per component, and that is best used as multi-component revalidate
> > on hot dcache...
>
> Having reread the RFC... What's the problem with intermediates?
> Just put GETFH and GETATTR between the LOOKUP for each component in
> the same compound and be done with that - you've got yourself everything
> you might possibly need for populating them. Confused...

The problem isn't getting intermediates. The problem is that not having
intermediates confuses the dcache. When the dcache is just providing a
caching service, and not providing a consistency service, then it shouldn't
let itself get confused.

>
> BTW, I would still very much prefer to allocate a chain of
> dentries in fs/namei.c (yes, marking them "in-lookup"), then gave an
> array of pointers (or beginning and end of the chain, but that can
> be more delicate due to dentry tree topology changes from e.g.
> d_materialize_unique(), aka d_splice_alias() these days). With
> the requirement being "populate them in root-to-leaves order, do nothing
> for ones that had in-lookup flag already cleared".
>
> Another fun possibility (but that would take somewhat more
> restructuring in fs/namei.c) would be to have (on hot cache) a path
> traced for several components, seeing that they are all on the same
> fs and delaying revalidation for a while. With bulk revalidate covering
> all the chain when we stumble across .., mountpoint or something we believe
> to be a symlink, or when the chain reaches fs-specified limit. Said bulk
> revalidate should tell how long a prefix had been OK. Permission change
> handling would be the painful part here...

This seems to just entrench the approach that the dcache is in control and
you are trying to contort it in some unnecessary way to meet one more need.

In the common case, the rcu_walk version of d_revalidate would be very simple
and batching them isn't going to buy much. Once you hit a "needs revalidate"
or anything else that trips up rcu_walk, just hand it all to the filesystem
and say "your problem". If the filesystem wants to continue one step at a
time, that is easy - there are helpers for that. If the filesystem wants to
send the remainder of the path to the server, it can do that too (being
careful of symlinks and mount points of course). Just give the filesystem
control for the rare slow path.

NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-17 03:48:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 8:12 PM, NeilBrown <[email protected]> wrote:
>
> The problem isn't getting intermediates. The problem is that not having
> intermediates confuses the dcache. When the dcache is just providing a
> caching service, and not providing a consistency service, then it shouldn't
> let itself get confused.

The dcache is much more than just a cache. It *is* in control. You may
not like it, but in the big picture, one odd filesystem not liking it
isn't a big deal.

Sorry, but that really is how it is. NFS isn't special enough for some
badly designed lookup models to matter one whit. And caching si *so*
effective, that the actual lookup case isn't the primary thing.

Yes, you can find loads where caching doesn't work well. But they are
odd and not all that important. The cases where caches dominate are
*much* more common.

Designing things around the 1% case (or the 0.01%) would be completely
insane. The dcache and the vfs is designed for the 99.9% case.

Linus

2015-05-17 04:04:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 8:48 PM, Linus Torvalds
<[email protected]> wrote:
>
> Sorry, but that really is how it is. NFS isn't special enough for some
> badly designed lookup models to matter one whit.

Btw, it's not just about performance, although the whole "we can do
cached lookups without ever having to et the filesystem involved" is a
big deal.

It's about getting fundamental concpets like mount points etc right,
it's about all those those things that the filesystem really doesn't
know about, and _cannot_ sanely know about.

It's now about things like overlayfs etc, all those things.

So the filesystem really isn't in control. Never will be. The
filesystem is at the mercy of (extended) unix semantics that are
bigger than the filesystem.

This is true of IO too. The filesystem does have a bit more
flexibility, but in the end, you have to do the readpage thing,
because it's the only way you'll get mmap. The filesystem isn't really
in control there either, there are strict rules for what it has to do
in order to have reasonable coherent mmap semantics.

So the vfs layer often does have a "library" approach, because
filesystems may do things in very different ways. But at the same
time, the vfs layer really *is* in control, because it's the vfs layer
that enforces certain basic semantics. So the dcache very much isn't
just sme "slave cache" that you choose to use and is at the control of
the filesystem. Like the page cache, you don't get a choice, because
you aren't in charge.

When somebody does a lookup of a filename, it is not a "pass this
filename to the filesystem". It very much *is* a
component-by-component lookup. And in the *vast* majority of the
cases, the cached lookup when you don't even get asked is absolutely
the right thing to do, and doing anything else wouldn't just be wrong,
it would be completely and utterly stupid.

And the fact that somebody doesn't understand that, and has designed
bad extensions to do multi-component lookup, isn't actually an
argument against the dcache. It's just an argument for "people make
bad intterfaces because they hack things up and don't understand
things".

Linus

2015-05-17 04:48:06

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, 16 May 2015 21:04:34 -0700 Linus Torvalds
<[email protected]> wrote:

> On Sat, May 16, 2015 at 8:48 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Sorry, but that really is how it is. NFS isn't special enough for some
> > badly designed lookup models to matter one whit.
>
> Btw, it's not just about performance, although the whole "we can do
> cached lookups without ever having to et the filesystem involved" is a
> big deal.
>
> It's about getting fundamental concpets like mount points etc right,
> it's about all those those things that the filesystem really doesn't
> know about, and _cannot_ sanely know about.
>
> It's now about things like overlayfs etc, all those things.
>
> So the filesystem really isn't in control. Never will be. The
> filesystem is at the mercy of (extended) unix semantics that are
> bigger than the filesystem.
>
> This is true of IO too. The filesystem does have a bit more
> flexibility, but in the end, you have to do the readpage thing,
> because it's the only way you'll get mmap. The filesystem isn't really
> in control there either, there are strict rules for what it has to do
> in order to have reasonable coherent mmap semantics.
>
> So the vfs layer often does have a "library" approach, because
> filesystems may do things in very different ways. But at the same
> time, the vfs layer really *is* in control, because it's the vfs layer
> that enforces certain basic semantics. So the dcache very much isn't
> just sme "slave cache" that you choose to use and is at the control of
> the filesystem. Like the page cache, you don't get a choice, because
> you aren't in charge.

Last I checked, sysfs doesn't use the page cache.
Of course, sysfs is a special case (but then, aren't we all -- deep down).

I like the page cache. Really do. It provides useful services and lots of
'generic' helpers. You can use the 'generic' versions directly, or wrap them
in a little bit of extra code, or rewrite them completely. It's lovely.

>
> When somebody does a lookup of a filename, it is not a "pass this
> filename to the filesystem". It very much *is* a
> component-by-component lookup. And in the *vast* majority of the
> cases, the cached lookup when you don't even get asked is absolutely
> the right thing to do, and doing anything else wouldn't just be wrong,
> it would be completely and utterly stupid.

I think you must have been reading someone else's emails, not mine. I'm
totally there with the cached lookups. They are awesome. Don't want
anything else. But when the cache doesn't have the answer - what then? The
filesystem is most likely to know how to fill the cache most efficiently.

I remember hunting after some problem a while ago. I don't remember the
exact details but it was related to when NFS is asked to perform permission
checks on the way to opening something. I'm pretty sure it involved
atomic_open() as a key part.
Anyway, the code is/was very hairy and seemed to be convoluted in order to
try to meet every bodies needs at once.
Having one piece of code that tries to handle the subtle details for all
filesystems is, I think, a mistake. Certainly have a block of code that does
the 'easy, local filesystem' version. But don't try to combine the
necessarily-different NFS version into the same block of code. It becomes
(nearly) unreadable.

I know there are interesting complex cases for open: O_EXCL and trailing
symlinks and things certainly make it interesting. But pretending that all
filesystems can be squashed into the one mould is just a pretence.

NeilBrown


>
> And the fact that somebody doesn't understand that, and has designed
> bad extensions to do multi-component lookup, isn't actually an
> argument against the dcache. It's just an argument for "people make
> bad intterfaces because they hack things up and don't understand
> things".
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-17 10:56:52

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sat, May 16, 2015 at 09:04:34PM -0700, Linus Torvalds wrote:

> It's now about things like overlayfs etc, all those things.

Er... Bad example, that - overlayfs is _not_ fs-agnostic.

> When somebody does a lookup of a filename, it is not a "pass this
> filename to the filesystem". It very much *is* a
> component-by-component lookup. And in the *vast* majority of the
> cases, the cached lookup when you don't even get asked is absolutely
> the right thing to do, and doing anything else wouldn't just be wrong,
> it would be completely and utterly stupid.
>
> And the fact that somebody doesn't understand that, and has designed
> bad extensions to do multi-component lookup, isn't actually an
> argument against the dcache. It's just an argument for "people make
> bad intterfaces because they hack things up and don't understand
> things".

And that is complete crap. Multi-component lookups do make sense; once
we are at the edge of the area present in dcache, we _know_ there won't
be any existing mountpoints involved; parsing the components and feeding
them to fs at once, along with an array of dentries to fill makes perfect
sense. Why bother with a bunch of roundtrips when we can have one?

Another thing is that the likely situation with revalidaions is that the
damn thing had been sitting around long enough for wanting to check with
the server, only to get "it's still OK" in response. Again, when that
happens it makes perfect sense to do speculative walk for more than one
component ("if everything's valid, the next 5 components would be these
and they all are on that filesystem; hey, server - are these 5 still valid?")
and avoid pointless roundtrips.

As for Neil's point re do_last() and friends being much too convoluted - yes,
they are. And it's not a result of trying to shoehorn everything in one
model. "Just let NFS have at it" as soon as we reach do_last() won't make
things any simpler, unfortunately - we'll end up with the same convoluted
mess copied in a bunch of filesystems, each with bugs of its own.

Don't get me wrong - do_last/lookup_open/atomic_open complex is not a pretty
sight and it does need cleaning up. Some of that got done lately, but that
was only a tangential hit. Untangling the whole damn thing won't be fun,
but we'll need to do it. Not this cycle, though - we already have more than
enough patches piled in fs/namei.c as it is...

2015-05-17 16:43:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, May 17, 2015 at 3:55 AM, Al Viro <[email protected]> wrote:
>
> And that is complete crap. Multi-component lookups do make sense; once
> we are at the edge of the area present in dcache, we _know_ there won't
> be any existing mountpoints involved; parsing the components and feeding
> them to fs at once, along with an array of dentries to fill makes perfect
> sense. Why bother with a bunch of roundtrips when we can have one?

Yes, the edges are easier. And yes, it's fine to do components one by one.

Maybe I misunderstood, but I thought that was exactly what Neil
*didn't* want to do, though. It sounded like he wanted to do
path-based lookup, not component-based one.

But yes, if it's purely about preloading the cache, then *that* should
be reasonably easy. In fact, it should work as-is today, if we just
added a "const char *hint" to the lookup callback which told the
filesystem what will come after this lookup. But it would be a hint
for pre-loading the dcache, nothing more.

So if we have a pathname like "a/b/c" that we don't have in the
dcache, and we're doing to look up component "a", we could give "b/c"
as the hint, and a filesystem that currently populates the dcache with
"a" by doing

d_instantiate(dentry, inode);

could decide that *before* it does that "d_instantiate()", it could
pre-populate the child list of 'dentry' with the lookup information
for 'b' (and possibly recursively for 'c' too under 'd').

But you'd still have to do the components one by one, you couldn't
just do the "final" tip.

And no, I absolutely refuse to even entertain the thought of the
filesystem actually doing any of the do_last crap. It would bt purely
about pre-populating the dcache deeper than the one single component,
and then the VFS layer would just find the pre-populated dentries and
do the normal thing.

Doing things that way means that not only does do_last() at the vfs
level already do the right thing, but we get all the per-component
semantics (with security checks etc) right, because we'd still be
traversing the pathname one component at a time. It's just the
filesystem that could prime the cache.

If *that* was what Neil wanted to do (rather than do "a/b/c" as one
single lookup to the server), then I withdraw all my complaints and am
sorry for having misunderstood.

Linus

2015-05-17 16:56:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, May 17, 2015 at 9:43 AM, Linus Torvalds
<[email protected]> wrote:
>
> d_instantiate(dentry, inode);
>
> could decide that *before* it does that "d_instantiate()", it could
> pre-populate the child list of 'dentry' with the lookup information
> for 'b' (and possibly recursively for 'c' too under 'd').

Yech, nfs uses d_splice_alias(), so we'd have to add some logic to
throw away any preloaded entries there if we actually have an alias
and end up merging and using the alias. Or something. Because I think
we'd have a dentry leak otherwise.

So it wouldn't work "as-is", but it might still be reasonably easy.

Linus

2015-05-17 23:16:43

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, 17 May 2015 09:43:34 -0700 Linus Torvalds
<[email protected]> wrote:

> On Sun, May 17, 2015 at 3:55 AM, Al Viro <[email protected]> wrote:
> >
> > And that is complete crap. Multi-component lookups do make sense; once
> > we are at the edge of the area present in dcache, we _know_ there won't
> > be any existing mountpoints involved; parsing the components and feeding
> > them to fs at once, along with an array of dentries to fill makes perfect
> > sense. Why bother with a bunch of roundtrips when we can have one?
>
> Yes, the edges are easier. And yes, it's fine to do components one by one.
>
> Maybe I misunderstood, but I thought that was exactly what Neil
> *didn't* want to do, though. It sounded like he wanted to do
> path-based lookup, not component-based one.

Just to be crystal clear about what I want:
I want the filesystem to be in control

Any examples, whether about multi-component lookup or path-based lookup or
O_EXCL opens are just throw-away examples. I have no desire to implement or
re-implement anything like that. I just want the filesystem to have control.
The reason I want, is that it will (ultimately) make the code easier to
understand and so easier to verify. And it will make implementing unusual
filesystems easier.

The dcache is just a cache. It is a great cache, but it isn't the filesystem.

So filesystems should be able to put things in the cache. And the VFS should
be able to look up things in the cache. And if the VFS finds everything it
needs to follow a full path all the way to the inode at the end, that is
great. But as soon as it hits something that the cache doesn't have an
answer for, it asks the filesystem.
As a useful simple case it can ask via d_revalidate in RCU mode, in which
case the filesystem either says (based on its own caching rules) "Yeah, this
one's OK really" and the VFS just keeps going, or the filesystem says "Nope,
I need more time with this one" and we drop out or RCU and to the more
general case.

In that general case it just hands everything to the filesystem.
The filesystem then uses generic helpers (or not) to find the answers and adds
more current information to the cache.
It could potentially just return and let the VFS continue down the cache (now
with current data), but it probably makes more sense for the filesystem to
explicitly return what it has.

So for Al's example of revalidating multiple components at once, once the VFS
gets to a point in the path where d_revalidate says "I need more time",
the VFS just passes the rest of the path to the filesystem.
The filesystem can then see what is in the cache and revalidate multiple
dentries in parallel. Or it could just send the rest of the path to the
server requesting attributes for each directory in the path, and then can pop
all of that into the dcache/icache and let the lookup complete.
Or it can just do one component at a time.


>
> But yes, if it's purely about preloading the cache, then *that* should
> be reasonably easy. In fact, it should work as-is today, if we just
> added a "const char *hint" to the lookup callback which told the
> filesystem what will come after this lookup. But it would be a hint
> for pre-loading the dcache, nothing more.

"hint" being a synonym for "layering violation" ??


NeilBrown

>
> So if we have a pathname like "a/b/c" that we don't have in the
> dcache, and we're doing to look up component "a", we could give "b/c"
> as the hint, and a filesystem that currently populates the dcache with
> "a" by doing
>
> d_instantiate(dentry, inode);
>
> could decide that *before* it does that "d_instantiate()", it could
> pre-populate the child list of 'dentry' with the lookup information
> for 'b' (and possibly recursively for 'c' too under 'd').
>
> But you'd still have to do the components one by one, you couldn't
> just do the "final" tip.
>
> And no, I absolutely refuse to even entertain the thought of the
> filesystem actually doing any of the do_last crap. It would bt purely
> about pre-populating the dcache deeper than the one single component,
> and then the VFS layer would just find the pre-populated dentries and
> do the normal thing.
>
> Doing things that way means that not only does do_last() at the vfs
> level already do the right thing, but we get all the per-component
> semantics (with security checks etc) right, because we'd still be
> traversing the pathname one component at a time. It's just the
> filesystem that could prime the cache.
>
> If *that* was what Neil wanted to do (rather than do "a/b/c" as one
> single lookup to the server), then I withdraw all my complaints and am
> sorry for having misunderstood.
>
> Linus


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-17 23:39:34

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, 17 May 2015 11:55:35 +0100 Al Viro <[email protected]> wrote:

> As for Neil's point re do_last() and friends being much too convoluted - yes,
> they are. And it's not a result of trying to shoehorn everything in one
> model. "Just let NFS have at it" as soon as we reach do_last() won't make
> things any simpler, unfortunately - we'll end up with the same convoluted
> mess copied in a bunch of filesystems, each with bugs of its own.

There is no reason to be so gloomy.
The VFS would provide a generic_do_last() (or whatever) which handles
everything correctly for local filesystems which keep the dcache precisely
consistent and use it for all the valuable locking it can provide.
generic_to_last() would call into other filesystem entry points just like the
current do_last() does. It wouldn't bother with 'revalidation' though.

Then there might be a "generic_network_do_last()" which does minimal if any
checking because the server will do all that, and just calls back to the
filesystem depending on which particular operation is happening - mkdir, or
unlink or whatever.

Filesystem developers are both very clever and (I assume) very lazy. If you
provide them with tools that work and are easy to understand, they will much
rather use them than try to implement their own. This is why the page cache
is such a success.


The more I think about this, the more it seems to me that having a dentry
that can be "unknown", or as you put it "in-lookup" is
potentially transformational.

Pathname lookup just always returns the dentry of the final component (plus
vfsmount), but that final component might be "unknown".

Then the various "last" steps are quite separate and each work with this
dentry. They unlink or mkdir or mknod or lookup or create or whatever.
There are probably races that make this a little more complicated than I am
hoping, and rename will doubtlessly be the most interesting. But it feels
like a good direction.

That is all a long way off of course. Even just getting a "in-lookup" dentry
and starting to explore what can be done with that would be a very positive
step I think.

Thanks,
NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-18 00:10:18

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Mon, May 18, 2015 at 09:39:07AM +1000, NeilBrown wrote:

> There is no reason to be so gloomy.

RTFS.

> The VFS would provide a generic_do_last() (or whatever) which handles
> everything correctly for local filesystems which keep the dcache precisely
> consistent and use it for all the valuable locking it can provide.
> generic_to_last() would call into other filesystem entry points just like the
> current do_last() does. It wouldn't bother with 'revalidation' though.
>
> Then there might be a "generic_network_do_last()" which does minimal if any
> checking because the server will do all that, and just calls back to the
> filesystem depending on which particular operation is happening - mkdir, or
> unlink or whatever.

RTFPOSIX. Semantics of the last step on open is very different from the
rest due to symlink handling.

And do_last() has nothing whatsoever with mkdir() or unlink(). _Those_
are much simpler and don't go anywhere near that rats' nest of horrors.

Seriously, read the damn thing. It *is* horrible, all right, but shifting
it inside NFS won't help you at all. Especially since you have NFSv3 to
cope with, which will take care of bringing in every sodding bit NFSv4 might
evade (for non-directories, that is - for directories you get the full shitpile
in your face, NFSv4 or not).

And no, server will _not_ "do all that". Again, mkdir and unlink are (almost)
trivial (unlink less so, due to NFS-specific shite). The real horrors are
on open().

I would be a lot less gloomy about discussing just passing the buck to
filesystems, starting with NFS, if NFS folks (you, in particular) had
bothered to figure out what the existing code _does_ and why is it doing
what it's doing. So far you have not - not even on the level of "which
functions are hit in which syscall", let alone what those functions are
doing.

And yes, the documentation of the whole thing is piss-poor. What we have
there right now is a weird mix of bits and pieces referring to very different
periods of evolution. Flat-out contradicting each other *and* the code.
I know. Believe me, I know. Fuck, right now the call graph for that thing
_finally_ fits into A4. Four years ago it took a goddamned A_1_. As in,
eight A4 sheets taped together. Yes, really.

At least now we finally have a reasonable chance of getting that sucker into
understandable shape and maybe even getting more folks to understand what
that code is doing. Which, unfortunately, _is_ a requirement for serious
reworks of that code. Frankly, the burden of keeping dcache consistent is
the least of the PITA in there.

2015-05-18 02:56:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, May 17, 2015 at 4:16 PM, NeilBrown <[email protected]> wrote:
>
> Just to be crystal clear about what I want:
> I want the filesystem to be in control

Yeah, no. Not going to happen.

You seem to think that the dcache is "just" a cache. It's not. It's a
cache, but that is absolutely not all that it is. It's very much a
cache with strong semantics.

And no, we're not handing over those semantics over to the filesystem.
The dcache is not just a cache, it's the *primary* data structure that
we use for pathname validation, local security checking, and for doing
things like "getcwd()" and handling ".." etc.

So there's no way the filesystem is "in control". You as a filesystem
are not really even doing the actual pathname lookup. The *only* thing
you're doing is filling in the dcache. The actual real pathname lookup
is done by the VFS layer using the dcache data.

That's how it very fundamentally works. It's *so* much more than a
cache - it really *is* the primary path lookup. The filesystem is the
slave in this relationship.

> The filesystem then uses generic helpers (or not) to find the answers and adds
> more current information to the cache.

You can do that already. There *are* those generic helpers to add data
to the cache. That's what "d_instantiate()" and friends _are_ for.

But no, you do *not* control name lookup. You get notified when
there's not enough data in the cache, and then you can fill it up any
which way you want.

You can populate the dcache with other entries than the one we asked
for, and you can ask the dcache to revalidate and throw dentries out.

But no, you do *not* get access to things like do_last() or to the
decision to follow symlinks or namespace rules, or mountpoints or
things like that.

> So for Al's example of revalidating multiple components at once, once the VFS
> gets to a point in the path where d_revalidate says "I need more time",
> the VFS just passes the rest of the path to the filesystem.

That's bullshit,. for a very simple and basic reason: "the rest of the
path" is not necessarily at all for your filesystem!

Really. There might be mount-points, there might be symlinks, there
might be tons of stuff like that.

You're not getting control, for the very simple reason that IT IS NOT
YOUR DATA. And it really never ever will be.

Now, this is why I said we can do a "hint" style thing. Part of that
"hint" issue is very very much that it has no semantic meaning. You
can't screw it up, because if it turns out that the path component
we're looking up is a symlink and we actually end up in some other
filesystem, if you end up looking up the hint part, it just would
never actually get used.

So it's kind of like a prefetch for names. It's semantically much
weaker than saying "look up this name". The hint would be "this is
likely the next part of the name that the VFS layer will look up".

And the key part of that statement is
(a) "likely" (it might not happen, and even if it does happen, it
migth not be for your filesystem)
and
(b) "the VFS layer will look up" because it won't be the low-level
filesystem doing it.

So it would be the low-level filesystem pre-populating the dcache - if
the low-level filesystem decides the hint is worth using for that -
and the VFS layer then uses the data in the dcache without further
bothering the filesystem.

Exactly because the dcache is *so* much more than "just a cache".

Linus

2015-05-18 03:43:05

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, May 17, 2015 at 07:56:26PM -0700, Linus Torvalds wrote:

> > So for Al's example of revalidating multiple components at once, once the VFS
> > gets to a point in the path where d_revalidate says "I need more time",
> > the VFS just passes the rest of the path to the filesystem.
>
> That's bullshit,. for a very simple and basic reason: "the rest of the
> path" is not necessarily at all for your filesystem!
>
> Really. There might be mount-points, there might be symlinks, there
> might be tons of stuff like that.

"Rest of the path" makes no sense, obviously. "More of the path" (and _not_
as a string, TYVM - we have those components in ->d_name.name of dentries we
want revalidated, complete with hashes, so WTF redo that?) is fine, though,
because the caller knows exactly where the mountpoints are, so we know how far
we can go.

> Now, this is why I said we can do a "hint" style thing. Part of that
> "hint" issue is very very much that it has no semantic meaning. You
> can't screw it up, because if it turns out that the path component
> we're looking up is a symlink and we actually end up in some other
> filesystem, if you end up looking up the hint part, it just would
> never actually get used.

For revalidate "this used to be a symlink, now it's not" or vice versa
means simply "it's gone stale". Which is fine - again, the caller knows
where in that chain the symlinks are (and they obviously terminate the
chain to be revalidated).

Anyway, it's a side issue; we _can_ use the capability to do multi-component
lookups and with link_path_walk()-related logics getting untangled, we might
be able to do just that without messing the code up. It's very clearly not
a 4.2 fodder, though, and I'm not sure how high priority it is for 4.3,
simply because making the exclusion between lookups weaker seems to be
of more general interest. And those two will definitely be stepping on the
same area, so it probably makes more sense to sort the parallel lookups out
first. _IF_ we manage to get that by 4.3-rc<early>, sure, continuation into
multicomponent lookups would start looking as 4.3 material. Hell knows -
stranger things have happened...

What we really need is a coherent documentation on the whole pathname-related
machinery; I've some preliminary bits and pieces written, but it'll take
more work. Hopefully I'll have something postable in a week or less...

Right now I can think of 4 or 5 people familiar with the area, myself
included. And we need more - it shouldn't be a fucking black magic, since
fs folks really ought to understand what's going on there. The thing is,
under the assorted layers of cruft it's simpler than e.g. VMA handling...

2015-05-18 04:21:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, May 17, 2015 at 8:42 PM, Al Viro <[email protected]> wrote:
>
> "Rest of the path" makes no sense, obviously. "More of the path" (and _not_
> as a string, TYVM - we have those components in ->d_name.name of dentries we
> want revalidated [..])

For revalidate, yes we kind of have them as dentries. I say kind of,
because it may be that we're only revalidating a directory in the
middle, and the rest of the path will be all new lookups, and we don't
have that part as dentries at all. But nobody is going to care about
"revalidate" for those.

HOWEVER.

We haven't actually walked/parsed the rest of the pathname yet at that
point, and we generally probably shouldn't, since we don't know if the
filesystem really is going to care. Why do extra work that may not be
useful?

So if we really do want to do it, and some filesystem cares enough, I
think we actually should just pass it in as a string, and then have a
helper function to say "ok, filesystem, if you want to revalidate the
rest of the path, use this function to turn the hinting string into
more dentries".

Because I don't think it's worth doing up-front if it's not clear that
the filesystem is going to care.

For example, the /proc filesystem uses d_revalidate() to re-check the
pid entries. That does *not* mean that it wants the rest of the
dentries pre-parsed at all. So by all means give it a "const char
*hint" for the rest, but it's going to just ignore it anyway, so don't
wast parsing it as "these will be the following dentries we will ask
you to revalidate".

So I do think that "const char *hint" might be the right thing to pass
down if we really care about this sufficiently. Both to revalidate and
to lookup().

We currently have that "rest" in lookup_slow() as

nd->last.name + hashlen_len(nd->last.hash_len)

and we could just pass that down to lookup_dcache (which does
revalidate) and lookup_real() (which does the actual ->lookup).

But I guess we could just save off 'name' into a new field in
nameidata in link_path_walk() after we've removed the slashes? I don't
know if it matters. We can mask off the slashes again if/when people
want to use the hint, after all.

Linus

2015-05-19 08:33:43

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, 17 May 2015 19:56:26 -0700 Linus Torvalds
<[email protected]> wrote:

> On Sun, May 17, 2015 at 4:16 PM, NeilBrown <[email protected]> wrote:
> >
> > Just to be crystal clear about what I want:
> > I want the filesystem to be in control
>
> Yeah, no. Not going to happen.
>
> You seem to think that the dcache is "just" a cache. It's not. It's a
> cache, but that is absolutely not all that it is. It's very much a
> cache with strong semantics.
>
> And no, we're not handing over those semantics over to the filesystem.
> The dcache is not just a cache, it's the *primary* data structure that
> we use for pathname validation, local security checking, and for doing
> things like "getcwd()" and handling ".." etc.

A fact that makes it relatively easy to create a situation where 'getcwd()'
returns a string for which 'stat' says ENOENT or where "cd .." puts you
somewhere that "getcwd" gets quite upset:

$ cd ..
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

$ ls -l /proc/self/cwd
lrwxrwxrwx 1 neilb users 0 May 19 17:28 /proc/self/cwd -> /mnt/tmp/cdir (deleted)

(and no: I hadn't deleted the cwd, just renamed some things on the server)

>
> So there's no way the filesystem is "in control". You as a filesystem
> are not really even doing the actual pathname lookup. The *only* thing
> you're doing is filling in the dcache. The actual real pathname lookup
> is done by the VFS layer using the dcache data.
>
> That's how it very fundamentally works. It's *so* much more than a
> cache - it really *is* the primary path lookup. The filesystem is the
> slave in this relationship.

This requires the VFS to have knowledge, sometimes intimate knowledge, of how
each filesystem works.
DCACHE_NFSFS_RENAMED ?
Oh wait, afs and btrfs know about that too, so it can't be too intimate.

>
> > The filesystem then uses generic helpers (or not) to find the answers and adds
> > more current information to the cache.
>
> You can do that already. There *are* those generic helpers to add data
> to the cache. That's what "d_instantiate()" and friends _are_ for.
>
> But no, you do *not* control name lookup. You get notified when
> there's not enough data in the cache, and then you can fill it up any
> which way you want.
>
> You can populate the dcache with other entries than the one we asked
> for, and you can ask the dcache to revalidate and throw dentries out.
>
> But no, you do *not* get access to things like do_last() or to the
> decision to follow symlinks or namespace rules, or mountpoints or
> things like that.

Obviously the important rules that you mention would be handled by library
code. But do_last() could be a lot simpler if the filesystem could manage the
'stale dentry' handling and call one version or the other of do_last
depending on whether it had an 'atomic_open' callback or not.

>
> > So for Al's example of revalidating multiple components at once, once the VFS
> > gets to a point in the path where d_revalidate says "I need more time",
> > the VFS just passes the rest of the path to the filesystem.
>
> That's bullshit,. for a very simple and basic reason: "the rest of the
> path" is not necessarily at all for your filesystem!

For revalidate: probably not, though the filesystem can ask questions of the
dcache just as easily as the VFS. For lookup, the rest of the path up to a
".." or symlink (which the filesystem can easily recognise) does belong to
the filesystem.

On this topic, Al suggested:

> With bulk revalidate covering
> all the chain when we stumble across .., mountpoint or something we believe
> to be a symlink, or when the chain reaches fs-specified limit.

That "fs-specific limit" is what really bothers me. This is feeding more
information about the fs into the VFS, and it assumes that a "limit" is the
thing that is meaningful for the VFS to know. Just let the FS take over and
use the approved interfaces to collect the dentries that it thinks might be
useful to revalidate, and then revalidate them.


>
> Really. There might be mount-points, there might be symlinks, there
> might be tons of stuff like that.
>
> You're not getting control, for the very simple reason that IT IS NOT
> YOUR DATA. And it really never ever will be.
>
> Now, this is why I said we can do a "hint" style thing. Part of that
> "hint" issue is very very much that it has no semantic meaning. You
> can't screw it up, because if it turns out that the path component
> we're looking up is a symlink and we actually end up in some other
> filesystem, if you end up looking up the hint part, it just would
> never actually get used.
>
> So it's kind of like a prefetch for names. It's semantically much
> weaker than saying "look up this name". The hint would be "this is
> likely the next part of the name that the VFS layer will look up".
>
> And the key part of that statement is
> (a) "likely" (it might not happen, and even if it does happen, it
> migth not be for your filesystem)
> and
> (b) "the VFS layer will look up" because it won't be the low-level
> filesystem doing it.
>
> So it would be the low-level filesystem pre-populating the dcache - if
> the low-level filesystem decides the hint is worth using for that -
> and the VFS layer then uses the data in the dcache without further
> bothering the filesystem.
>
> Exactly because the dcache is *so* much more than "just a cache".
>
> Linus

Well, let's just start with that "in-lookup" or "unknown" dentry that has
been mentioned, so that the VFS doesn't have to hold i_mutex across lookup
and create, and so that the filesystem can at least control it's own locking.

That would be a big step forward in my mind.

Thanks,
NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature