2004-06-22 20:07:19

by Al Viro

[permalink] [raw]
Subject: [PATCH] (1/9) symlink patchkit (against -bk current)

infrastructure - helpers allowing ->follow_link() to leave
a pathname to be traversed by caller + corresponding code in callers.

diff -urN RC7-bk5-base/fs/namei.c RC7-bk5-core/fs/namei.c
--- RC7-bk5-base/fs/namei.c Tue Jun 22 13:15:08 2004
+++ RC7-bk5-core/fs/namei.c Tue Jun 22 15:13:10 2004
@@ -395,6 +395,8 @@
return result;
}

+static inline int __vfs_follow_link(struct nameidata *, const char *);
+
/*
* This limits recursive symlink follows to 8, while
* limiting consecutive symlinks to 40.
@@ -405,19 +407,30 @@
static inline int do_follow_link(struct dentry *dentry, struct nameidata *nd)
{
int err = -ELOOP;
- if (current->link_count >= 5)
+ if (current->link_count >= MAX_NESTED_LINKS)
goto loop;
if (current->total_link_count >= 40)
goto loop;
+ BUG_ON(nd->depth >= MAX_NESTED_LINKS);
cond_resched();
err = security_inode_follow_link(dentry, nd);
if (err)
goto loop;
current->link_count++;
current->total_link_count++;
+ nd->depth++;
touch_atime(nd->mnt, dentry);
+ nd_set_link(nd, NULL);
err = dentry->d_inode->i_op->follow_link(dentry, nd);
+ if (!err) {
+ char *s = nd_get_link(nd);
+ if (s)
+ err = __vfs_follow_link(nd, s);
+ if (dentry->d_inode->i_op->put_link)
+ dentry->d_inode->i_op->put_link(dentry, nd);
+ }
current->link_count--;
+ nd->depth--;
return err;
loop:
path_release(nd);
@@ -587,7 +600,7 @@
goto return_reval;

inode = nd->dentry->d_inode;
- if (current->link_count)
+ if (nd->depth)
lookup_flags = LOOKUP_FOLLOW;

/* At this point we know we have a real path component. */
@@ -795,6 +808,7 @@
*/
nd_root.last_type = LAST_ROOT;
nd_root.flags = nd->flags;
+ nd_root.depth = 0;
memcpy(&nd_root.intent, &nd->intent, sizeof(nd_root.intent));
read_lock(&current->fs->lock);
nd_root.mnt = mntget(current->fs->rootmnt);
@@ -867,6 +881,7 @@

nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags;
+ nd->depth = 0;

read_lock(&current->fs->lock);
if (*name=='/') {
@@ -1385,7 +1400,15 @@
if (error)
goto exit_dput;
touch_atime(nd->mnt, dentry);
+ nd_set_link(nd, NULL);
error = dentry->d_inode->i_op->follow_link(dentry, nd);
+ if (!error) {
+ char *s = nd_get_link(nd);
+ if (s)
+ error = __vfs_follow_link(nd, s);
+ if (dentry->d_inode->i_op->put_link)
+ dentry->d_inode->i_op->put_link(dentry, nd);
+ }
dput(dentry);
if (error)
return error;
@@ -2182,7 +2205,7 @@
}
res = link_path_walk(link, nd);
out:
- if (current->link_count || res || nd->last_type!=LAST_NORM)
+ if (nd->depth || res || nd->last_type!=LAST_NORM)
return res;
/*
* If it is an iterative symlinks resolution in open_namei() we
diff -urN RC7-bk5-base/include/linux/fs.h RC7-bk5-core/include/linux/fs.h
--- RC7-bk5-base/include/linux/fs.h Tue Jun 22 13:15:13 2004
+++ RC7-bk5-core/include/linux/fs.h Tue Jun 22 15:13:10 2004
@@ -902,6 +902,7 @@
struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char __user *,int);
int (*follow_link) (struct dentry *, struct nameidata *);
+ void (*put_link) (struct dentry *, struct nameidata *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, struct nameidata *);
int (*setattr) (struct dentry *, struct iattr *);
diff -urN RC7-bk5-base/include/linux/namei.h RC7-bk5-core/include/linux/namei.h
--- RC7-bk5-base/include/linux/namei.h Mon Jul 28 11:01:24 2003
+++ RC7-bk5-core/include/linux/namei.h Tue Jun 22 15:13:10 2004
@@ -10,12 +10,16 @@
int create_mode;
};

+enum { MAX_NESTED_LINKS = 5 };
+
struct nameidata {
struct dentry *dentry;
struct vfsmount *mnt;
struct qstr last;
unsigned int flags;
int last_type;
+ unsigned depth;
+ char *saved_names[MAX_NESTED_LINKS + 1];

/* Intent data */
union {
@@ -66,5 +70,15 @@

extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
+
+static inline void nd_set_link(struct nameidata *nd, char *path)
+{
+ nd->saved_names[nd->depth] = path;
+}
+
+static inline char *nd_get_link(struct nameidata *nd)
+{
+ return nd->saved_names[nd->depth];
+}

#endif /* _LINUX_NAMEI_H */


2004-06-22 22:56:46

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH] (1/9) symlink patchkit (against -bk current)

Al (or other),

Was there anything that described what these 9 patches are doing,
overall? They look intriguing, but I'm too lazy to read all the code
in order to get the gist of them. The per-patch comments at the top
mention something of a "new scheme", but don't, that I noticed, explain
what that means. Perhaps something on lkml I missed ...?

Thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-06-23 00:20:17

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] (1/9) symlink patchkit (against -bk current)

On Tue, Jun 22, 2004 at 03:55:23PM -0700, Paul Jackson wrote:
> Al (or other),
>
> Was there anything that described what these 9 patches are doing,
> overall? They look intriguing, but I'm too lazy to read all the code
> in order to get the gist of them. The per-patch comments at the top
> mention something of a "new scheme", but don't, that I noticed, explain
> what that means. Perhaps something on lkml I missed ...?

My apologies (there was chunk #0 in the set, but I've fscked the
script up - `seq 9` instead of `seq 0 9`). See below:

----------------------------------------------------------------------------

Here's a patchkit that seriously reduces the stack footprint
of symlink resolution _and_ cleans the things up ->readlink()/->follow_link()
implementations.

How it works:
* ->follow_link() still does what it used to do - replaces
vfsmount/dentry in the nameidata it got from caller. However, it
can also leave a pathname to be resolved by caller.
* we add an array of char * into nameidata; we always work
with nd->saved_names[nd->depth]. nd_set_link() sets it, nd_get_link()
returns it.
* callers of ->follow_link() (all two of them) check if
->follow_link() had left us something to do. If it had (return
value was zero and nd_get_link() is non-NULL), they do __vfs_follow_link()
on that name. Then they call a new method (->put_link()) that frees
whatever has to be freed, etc.

Note that absolute majority of symlinks have "resolve a pathname" as
part of their ->follow_link(); they can do something else and some
don't do that at all, but having that pathname resolution is very,
very common.

With that change we allow them to shift pathname resolution part
to caller. They don't have to - it's perfectly OK to do all work
in ->follow_link(). However, leaving the pathname resolution to
caller will
a) exclude foo_follow_link() stack frame from the picture
b) kill 2 stack frames - all callers are in fs/namei.c
and they can use inlined variant of vfs_follow_link().

That reduction of stack use is enough to push the limit on nested
symlinks from 5 to 8 (actually, even beyond that, but since 8 is
common for other Unices it will do fine).

For those who have "pure" ->follow_link() (i.e. "find a string that would
be symlink contents and say nd_set_link(nd, string)") we also get a common
helper implementing ->readlink() - it just calls ->follow_link() on a
dummy nameidata, calls vfs_readlink() on result of nd_get_link() and
does ->put_link(). Using (or not using) it is up to filesystem; it's
a helper that can be used as a ->readlink() for many filesystems, not
a reimplementation of sys_readlink(). However, that's _MANY_ filesystems -
practically all of them.

Note that we don't put any crap like "if this is a normal symlink,
do this; otherwise call ->follow_link() and let it do its magic"
into callers - all symlinks are handled the same way. Which was
the main problem with getlink proposal back then. In a sense, the
situation is inverse - instead of implementing ->follow_link() via
new method very similar to ->readlink() we provide a helper that
can be used as ->readlink() _if_ ->follow_link() satisfies some
(very common) conditions. If we need something trickier than
done by that helper - very well, fs should simply use whatever
code is suitable for it (presumably what it had always used as
->readlink()).

Change is backwards compatible - any filesystem that doesn't care to change
works as it always did. We won't get any stack footprint reduction on the
symlinks from that fs, obviously, but that's exactly what we had until now.
When all filesystems switch to that scheme, we can safely raise the limit
on nested symlinks. If some site knows that all filesystems they are using
are already there, they can obviously raise the limit for themselves not
waiting for the rest of the world...

Contents of patchkit (part that had been in -mm, that is):
SL0: infrastructure - helpers allowing ->follow_link() to leave
a pathname to be traversed by caller + corresponding code in callers.

SL1: ext2 conversion (helper functions for that one will be actually
used a lot by other filesystems, so to fs/namei.c they go)

SL2: trivial cases - ones where we have no need to clean up after
pathname traversal (link body embedded into inode, etc.). Plugged leak in
devfs_follow_link(), while we are at it.

SL3: cases that can simply reuse ext2 helpers (page_follow_link_light()
and page_put_link()).

SL4: smbfs - switched from on-stack allocation of buffer for link
body (!) to __getname()/putname(); switched to new scheme.

SL5: xfs switched to new scheme; leaks plugged.

SL6: shm switched (it almost belongs to SL3, but it does some extra
stuff after the link traversal).

SL7: befs switched; leaks plugged.

SL8: ditto for jffs2.