LinuxLists.cc - [PATCH] dcache: Translating dentry into pathname without taking rename

2013-09-04 19:06:14

Subject: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

When running the AIM7's short workload, Linus' lockref patch eliminated
most of the spinlock contention. However, there were still some left:

8.46% reaim [kernel.kallsyms] [k] _raw_spin_lock
|--42.21%-- d_path
| proc_pid_readlink
| SyS_readlinkat
| SyS_readlink
| system_call
| __GI___readlink
|
|--40.97%-- sys_getcwd
| system_call
| __getcwd

The big one here is the rename_lock (seqlock) contention in d_path()
and the getcwd system call. This patch will eliminate the need to take
the rename_lock while translating dentries into the full pathnames.

The need to take the rename_lock is to make sure that no rename
operation can be ongoing while the translation is in progress. However,
only one thread can take the rename_lock thus blocking all the other
threads that need it even though the translation process won't make
any change to the dentries.

This patch will replace the writer's write_seqlock/write_sequnlock
sequence of the rename_lock of the callers of the prepend_path() and
__dentry_path() functions with the reader's read_seqbegin/read_seqretry
sequence within these 2 functions. As a result, the code will have to
retry if one or more rename operations had been performed. In addition,
RCU read lock will be taken during the translation process to make
sure that no dentries will go away.

In addition, the dentry's d_lock is also not taken to further reduce
spinlock contention. However, this means that the name pointer and
other dentry fields may not be valid. As a result, the code was
enhanced to handle the case that the parent pointer or the name
pointer can be NULL.

With this patch, the _raw_spin_lock will now account for only 1.2%
of the total CPU cycles for the short workload. This patch also has
the effect of reducing the effect of running perf on its profile
since the perf command itself can be a heavy user of the d_path()
function depending on the complexity of the workload.

When taking the perf profile of the high-systime workload, the amount
of spinlock contention contributed by running perf without this patch
was about 16%. With this patch, the spinlock contention caused by
the running of perf will go away and we will have a more accurate
perf profile.

Signed-off-by: Waiman Long <[email protected]>
---
fs/dcache.c | 118 +++++++++++++++++++++++++++++++++++++++++------------------
1 files changed, 82 insertions(+), 36 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 96655f4..12d07d7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2512,7 +2512,19 @@ static int prepend(char **buffer, int *buflen, const char *str, int namelen)

static int prepend_name(char **buffer, int *buflen, struct qstr *name)
{
- return prepend(buffer, buflen, name->name, name->len);
+ /*
+ * With RCU path tracing, it may race with rename. Use
+ * ACCESS_ONCE() to make sure that it is either the old or
+ * the new name pointer. The length does not really matter as
+ * the sequence number check will eventually catch any ongoing
+ * rename operation.
+ */
+ const char *dname = ACCESS_ONCE(name->name);
+ int dlen = name->len;
+
+ if (unlikely(!dname || !dlen))
+ return -EINVAL;
+ return prepend(buffer, buflen, dname, dlen);
}

/**
@@ -2522,7 +2534,12 @@ static int prepend_name(char **buffer, int *buflen, struct qstr *name)
* @buffer: pointer to the end of the buffer
* @buflen: pointer to buffer length
*
- * Caller holds the rename_lock.
+ * The function tries to write out the pathname without taking any lock other
+ * than the RCU read lock to make sure that dentries won't go away. It only
+ * checks the sequence number of the global rename_lock as any change in the
+ * dentry's d_seq will be preceded by changes in the rename_lock sequence
+ * number. If the sequence number had been change, it will restart the whole
+ * pathname back-tracing sequence again
*/
static int prepend_path(const struct path *path,
const struct path *root,
@@ -2533,7 +2550,15 @@ static int prepend_path(const struct path *path,
struct mount *mnt = real_mount(vfsmnt);
bool slash = false;
int error = 0;
+ unsigned seq;
+ char *bptr;
+ int blen;

+restart:
+ bptr = *buffer;
+ blen = *buflen;
+ seq = read_seqbegin(&rename_lock);
+ rcu_read_lock();
while (dentry != root->dentry || vfsmnt != root->mnt) {
struct dentry * parent;

@@ -2547,22 +2572,38 @@ static int prepend_path(const struct path *path,
continue;
}
parent = dentry->d_parent;
+ if (unlikely(parent == NULL)) {
+ rcu_read_unlock();
+ if (read_seqretry(&rename_lock, seq))
+ goto restart;
+ goto garbage;
+ }
prefetch(parent);
- spin_lock(&dentry->d_lock);
- error = prepend_name(buffer, buflen, &dentry->d_name);
- spin_unlock(&dentry->d_lock);
+ error = prepend_name(&bptr, &blen, &dentry->d_name);
if (!error)
- error = prepend(buffer, buflen, "/", 1);
+ error = prepend(&bptr, &blen, "/", 1);
if (error)
break;

slash = true;
dentry = parent;
}
+ rcu_read_unlock();
+ if (read_seqretry(&rename_lock, seq))
+ goto restart;

if (!error && !slash)
- error = prepend(buffer, buflen, "/", 1);
+ error = prepend(&bptr, &blen, "/", 1);
+
+ret:
+ if (error == -EINVAL)
+ goto garbage;
+ *buffer = bptr;
+ *buflen = blen;
+ return error;

+garbage:
+ error = prepend(buffer, buflen, "(garbage)", 10);
return error;

global_root:
@@ -2575,11 +2616,14 @@ global_root:
WARN(1, "Root dentry has weird name <%.*s>\n",
(int) dentry->d_name.len, dentry->d_name.name);
}
+ rcu_read_unlock();
+ if (read_seqretry(&rename_lock, seq))
+ goto restart;
if (!slash)
- error = prepend(buffer, buflen, "/", 1);
+ error = prepend(&bptr, &blen, "/", 1);
if (!error)
error = is_mounted(vfsmnt) ? 1 : 2;
- return error;
+ goto ret;
}

/**
@@ -2607,9 +2651,7 @@ char *__d_path(const struct path *path,

prepend(&res, &buflen, "\0", 1);
br_read_lock(&vfsmount_lock);
- write_seqlock(&rename_lock);
error = prepend_path(path, root, &res, &buflen);
- write_sequnlock(&rename_lock);
br_read_unlock(&vfsmount_lock);

if (error < 0)
@@ -2628,9 +2670,7 @@ char *d_absolute_path(const struct path *path,

prepend(&res, &buflen, "\0", 1);
br_read_lock(&vfsmount_lock);
- write_seqlock(&rename_lock);
error = prepend_path(path, &root, &res, &buflen);
- write_sequnlock(&rename_lock);
br_read_unlock(&vfsmount_lock);

if (error > 1)
@@ -2696,9 +2736,7 @@ char *d_path(const struct path *path, char *buf, int buflen)

get_fs_root(current->fs, &root);
br_read_lock(&vfsmount_lock);
- write_seqlock(&rename_lock);
error = path_with_deleted(path, &root, &res, &buflen);
- write_sequnlock(&rename_lock);
br_read_unlock(&vfsmount_lock);
if (error < 0)
res = ERR_PTR(error);
@@ -2744,44 +2782,57 @@ char *simple_dname(struct dentry *dentry, char *buffer, int buflen)
*/
static char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
{
- char *end = buf + buflen;
- char *retval;
+ char *end, *retval;
+ int len, seq, error = 0;

- prepend(&end, &buflen, "\0", 1);
+restart:
+ end = buf + buflen;
+ len = buflen;
+ prepend(&end, &len, "\0", 1);
if (buflen < 1)
goto Elong;
/* Get '/' right */
retval = end-1;
*retval = '/';
-
+ seq = read_seqbegin(&rename_lock);
+ rcu_read_lock();
while (!IS_ROOT(dentry)) {
struct dentry *parent = dentry->d_parent;
int error;

+ if (unlikely(parent == NULL))
+ goto chk_seq;
prefetch(parent);
- spin_lock(&dentry->d_lock);
- error = prepend_name(&end, &buflen, &dentry->d_name);
- spin_unlock(&dentry->d_lock);
- if (error != 0 || prepend(&end, &buflen, "/", 1) != 0)
- goto Elong;
+ error = prepend_name(&end, &len, &dentry->d_name);
+ if (!error)
+ error = prepend(&end, &len, "/", 1);
+ if (error)
+ goto chk_seq;

retval = end;
dentry = parent;
}
+ rcu_read_unlock();
+ if (read_seqretry(&rename_lock, seq))
+ goto restart;
return retval;
Elong:
return ERR_PTR(-ENAMETOOLONG);
+chk_seq:
+ rcu_read_unlock();
+ if (read_seqretry(&rename_lock, seq))
+ goto restart;
+ if (error == -ENAMETOOLONG)
+ goto Elong;
+ error = prepend(&end, &len, "//garbage", 10);
+ if (error)
+ goto Elong;
+ return end;
}

char *dentry_path_raw(struct dentry *dentry, char *buf, int buflen)
{
- char *retval;
-
- write_seqlock(&rename_lock);
- retval = __dentry_path(dentry, buf, buflen);
- write_sequnlock(&rename_lock);
-
- return retval;
+ return __dentry_path(dentry, buf, buflen);
}
EXPORT_SYMBOL(dentry_path_raw);

@@ -2790,7 +2841,6 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen)
char *p = NULL;
char *retval;

- write_seqlock(&rename_lock);
if (d_unlinked(dentry)) {
p = buf + buflen;
if (prepend(&p, &buflen, "//deleted", 10) != 0)
@@ -2798,7 +2848,6 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen)
buflen++;
}
retval = __dentry_path(dentry, buf, buflen);
- write_sequnlock(&rename_lock);
if (!IS_ERR(retval) && p)
*p = '/'; /* restore '/' overriden with '\0' */
return retval;
@@ -2837,7 +2886,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)

error = -ENOENT;
br_read_lock(&vfsmount_lock);
- write_seqlock(&rename_lock);
if (!d_unlinked(pwd.dentry)) {
unsigned long len;
char *cwd = page + PAGE_SIZE;
@@ -2845,7 +2893,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)

prepend(&cwd, &buflen, "\0", 1);
error = prepend_path(&pwd, &root, &cwd, &buflen);
- write_sequnlock(&rename_lock);
br_read_unlock(&vfsmount_lock);

if (error < 0)
@@ -2866,7 +2913,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
error = -EFAULT;
}
} else {
- write_sequnlock(&rename_lock);
br_read_unlock(&vfsmount_lock);
}

--
1.7.1

2013-09-04 19:11:10

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On Wed, Sep 04, 2013 at 03:05:23PM -0400, Waiman Long wrote:
>
> static int prepend_name(char **buffer, int *buflen, struct qstr *name)
> {
> - return prepend(buffer, buflen, name->name, name->len);
> + /*
> + * With RCU path tracing, it may race with rename. Use
> + * ACCESS_ONCE() to make sure that it is either the old or
> + * the new name pointer. The length does not really matter as
> + * the sequence number check will eventually catch any ongoing
> + * rename operation.
> + */
> + const char *dname = ACCESS_ONCE(name->name);
> + int dlen = name->len;
> +
> + if (unlikely(!dname || !dlen))
> + return -EINVAL;
> + return prepend(buffer, buflen, dname, dlen);

NAK. A race with d_move() can very well leave you with dname pointing into
an object of length smaller than dlen. You *can* copy it byte-by-byte
and rely on NUL-termination, but you can't rely on length being accurate -
not without having excluded d_move().

2013-09-04 19:26:54

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/04/2013 03:05 PM, Waiman Long wrote:
> When running the AIM7's short workload, Linus' lockref patch eliminated
> most of the spinlock contention. However, there were still some left:
>
> 8.46% reaim [kernel.kallsyms] [k] _raw_spin_lock
> |--42.21%-- d_path
> | proc_pid_readlink
> | SyS_readlinkat
> | SyS_readlink
> | system_call
> | __GI___readlink
> |
> |--40.97%-- sys_getcwd
> | system_call
> | __getcwd
>
> The big one here is the rename_lock (seqlock) contention in d_path()
> and the getcwd system call. This patch will eliminate the need to take
> the rename_lock while translating dentries into the full pathnames.
>
> The need to take the rename_lock is to make sure that no rename
> operation can be ongoing while the translation is in progress. However,
> only one thread can take the rename_lock thus blocking all the other
> threads that need it even though the translation process won't make
> any change to the dentries.
>
> This patch will replace the writer's write_seqlock/write_sequnlock
> sequence of the rename_lock of the callers of the prepend_path() and
> __dentry_path() functions with the reader's read_seqbegin/read_seqretry
> sequence within these 2 functions. As a result, the code will have to
> retry if one or more rename operations had been performed. In addition,
> RCU read lock will be taken during the translation process to make
> sure that no dentries will go away.
>
> In addition, the dentry's d_lock is also not taken to further reduce
> spinlock contention. However, this means that the name pointer and
> other dentry fields may not be valid. As a result, the code was
> enhanced to handle the case that the parent pointer or the name
> pointer can be NULL.
>
> With this patch, the _raw_spin_lock will now account for only 1.2%
> of the total CPU cycles for the short workload. This patch also has
> the effect of reducing the effect of running perf on its profile
> since the perf command itself can be a heavy user of the d_path()
> function depending on the complexity of the workload.
>
> When taking the perf profile of the high-systime workload, the amount
> of spinlock contention contributed by running perf without this patch
> was about 16%. With this patch, the spinlock contention caused by
> the running of perf will go away and we will have a more accurate
> perf profile.
>
> Signed-off-by: Waiman Long<[email protected]>
In term of AIM7 performance, this patch has a performance boost of
about 6-7% on top of Linus' lockref patch on a 8-socket 80-core DL980.

User Range | 10-100 | 200-10000 | 1100-2000 |
Mean JPM w/o patch | 4,365,114 | 7,211,115 | 6,964,439 |
Mean JPM with patch | 3,872,850 | 7,655,958 | 7,422,598 |
% Change | -11.3% | +6.2% | +6.6% |

I am not sure if it is too aggresive for not taking the d_lock before
prepend_name(). I can add back those locking instructions if necessary.

Regards,
Longman

2013-09-04 19:33:14

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/04/2013 03:11 PM, Al Viro wrote:
> On Wed, Sep 04, 2013 at 03:05:23PM -0400, Waiman Long wrote:
>>
>> static int prepend_name(char **buffer, int *buflen, struct qstr *name)
>> {
>> - return prepend(buffer, buflen, name->name, name->len);
>> + /*
>> + * With RCU path tracing, it may race with rename. Use
>> + * ACCESS_ONCE() to make sure that it is either the old or
>> + * the new name pointer. The length does not really matter as
>> + * the sequence number check will eventually catch any ongoing
>> + * rename operation.
>> + */
>> + const char *dname = ACCESS_ONCE(name->name);
>> + int dlen = name->len;
>> +
>> + if (unlikely(!dname || !dlen))
>> + return -EINVAL;
>> + return prepend(buffer, buflen, dname, dlen);
> NAK. A race with d_move() can very well leave you with dname pointing into
> an object of length smaller than dlen. You *can* copy it byte-by-byte
> and rely on NUL-termination, but you can't rely on length being accurate -
> not without having excluded d_move().

I have thought about that. But if a d_move() is going on, the string in
the buffer will be discarded as the sequence number will change. So
whether or not it have embedded null byte shouldn't matter. That is why
I didn't add code to do byte-by-byte copy at this first patch. I can add
code to do that if you think it is safer to do so.

Regards,
Longman

2013-09-04 19:43:46

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On Wed, Sep 04, 2013 at 03:33:00PM -0400, Waiman Long wrote:

> I have thought about that. But if a d_move() is going on, the string
> in the buffer will be discarded as the sequence number will change.
> So whether or not it have embedded null byte shouldn't matter. That
> is why I didn't add code to do byte-by-byte copy at this first
> patch. I can add code to do that if you think it is safer to do so.

Sigh... Junk in the output is not an issue; reading from invalid address
is, since you might not survive to the sequence number check. Again,
if p is an address returned by kmalloc(size, ...), dereferencing p + offset
is not safe unless offset is less than size.

2013-09-04 21:04:09

by John Stoffel

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

>>>>> "Waiman" == Waiman Long <[email protected]> writes:

Waiman> In term of AIM7 performance, this patch has a performance boost of
Waiman> about 6-7% on top of Linus' lockref patch on a 8-socket 80-core DL980.

Waiman> User Range | 10-100 | 200-10000 | 1100-2000 |
Waiman> Mean JPM w/o patch | 4,365,114 | 7,211,115 | 6,964,439 |
Waiman> Mean JPM with patch | 3,872,850 | 7,655,958 | 7,422,598 |
Waiman> % Change | -11.3% | +6.2% | +6.6% |

This -11% impact is worisome to me, because at smaller numbers of
users, I would still expect the performance to go up. So why the big
drop?

Also, how is the impact of these changes on smaller 1 socket, 4 core
systems? Just because it helps a couple of big boxes, doesn't mean it
won't hurt the more common small case.

John

2013-09-04 21:31:53

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On Wed, Sep 4, 2013 at 12:05 PM, Waiman Long <[email protected]> wrote:
> + rcu_read_unlock();
> + if (read_seqretry(&rename_lock, seq))
> + goto restart;

Btw, you have this pattern twice, and while it's not necessarily
incorrect, it's a bit worrisome for performance.

The rcu_read_unlock() is very possibly going to trigger an immediate
scheduling event, so checking the sequence lock after dropping the
read-lock sounds like it would make it much easier to hit the race
with some rename.

I'm also a tiny bit worried about livelocking on the sequence lock in
the presence of lots of renames, so I'm wondering if the locking
should try to approximate what we do for the RCU lookup path: start
off optimistically using just the RCU lock and a sequence point, but
if that fails, fall back to taking the real lock. Maybe using a
counter ("try twice, then get the rename lock for writing")

Hmm?

Linus

2013-09-05 01:55:59

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/04/2013 03:43 PM, Al Viro wrote:
> On Wed, Sep 04, 2013 at 03:33:00PM -0400, Waiman Long wrote:
>
>> I have thought about that. But if a d_move() is going on, the string
>> in the buffer will be discarded as the sequence number will change.
>> So whether or not it have embedded null byte shouldn't matter. That
>> is why I didn't add code to do byte-by-byte copy at this first
>> patch. I can add code to do that if you think it is safer to do so.
> Sigh... Junk in the output is not an issue; reading from invalid address
> is, since you might not survive to the sequence number check. Again,
> if p is an address returned by kmalloc(size, ...), dereferencing p + offset
> is not safe unless offset is less than size.

Yeah, I understand that. As said in my reply to Linus, I will use
memchr() to see if there is null byte within the specified length. If
one is found, I will assume the string is not valid and return error to
the caller.

Longman

2013-09-05 02:04:46

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/04/2013 04:40 PM, John Stoffel wrote:
>>>>>> "Waiman" == Waiman Long<[email protected]> writes:
> Waiman> In term of AIM7 performance, this patch has a performance boost of
> Waiman> about 6-7% on top of Linus' lockref patch on a 8-socket 80-core DL980.
>
> Waiman> User Range | 10-100 | 200-10000 | 1100-2000 |
> Waiman> Mean JPM w/o patch | 4,365,114 | 7,211,115 | 6,964,439 |
> Waiman> Mean JPM with patch | 3,872,850 | 7,655,958 | 7,422,598 |
> Waiman> % Change | -11.3% | +6.2% | +6.6% |
>
> This -11% impact is worisome to me, because at smaller numbers of
> users, I would still expect the performance to go up. So why the big
> drop?
>
> Also, how is the impact of these changes on smaller 1 socket, 4 core
> systems? Just because it helps a couple of big boxes, doesn't mean it
> won't hurt the more common small case.
>
> John

I don't believe the patch will make it slower with less user. It is more
a result of run-to-run variation. The short workload typically completed
in a very short time. In the 10-100 user range, the completion times
range from 0.02-0.11s. With a higher user count, it needs several
seconds to run and hence the results are more reliable.

Longman

2013-09-05 02:17:18

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/04/2013 05:31 PM, Linus Torvalds wrote:
> On Wed, Sep 4, 2013 at 12:05 PM, Waiman Long<[email protected]> wrote:
>> + rcu_read_unlock();
>> + if (read_seqretry(&rename_lock, seq))
>> + goto restart;
> Btw, you have this pattern twice, and while it's not necessarily
> incorrect, it's a bit worrisome for performance.

The rcu_read_unlock sequence in the middle of prepend_path() is not
likely to executed. So it shouldn't affect performance exception for the
conditional check.

> The rcu_read_unlock() is very possibly going to trigger an immediate
> scheduling event, so checking the sequence lock after dropping the
> read-lock sounds like it would make it much easier to hit the race
> with some rename.

I can put read_seqbegin/read_seqretry within the
rcu_read_lock/rcu_read_unlock block. However, read_seqbegin() can spin
for a while if a rename is in progress. So it depends on which is more
important, a shorter RCU critical section at the expense of more retries
or vice versa.

> I'm also a tiny bit worried about livelocking on the sequence lock in
> the presence of lots of renames, so I'm wondering if the locking
> should try to approximate what we do for the RCU lookup path: start
> off optimistically using just the RCU lock and a sequence point, but
> if that fails, fall back to taking the real lock. Maybe using a
> counter ("try twice, then get the rename lock for writing")
>
> Hmm?

Yes, I can implement a counter that switch to taking the rename_lock if
the count reaches 0. It shouldn't be too hard. That should avoid the
possibility of a livelock.

Longman

2013-09-05 02:42:27

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On Wed, Sep 04, 2013 at 09:55:43PM -0400, Waiman Long wrote:
> On 09/04/2013 03:43 PM, Al Viro wrote:
> >On Wed, Sep 04, 2013 at 03:33:00PM -0400, Waiman Long wrote:
> >
> >>I have thought about that. But if a d_move() is going on, the string
> >>in the buffer will be discarded as the sequence number will change.
> >>So whether or not it have embedded null byte shouldn't matter. That
> >>is why I didn't add code to do byte-by-byte copy at this first
> >>patch. I can add code to do that if you think it is safer to do so.
> >Sigh... Junk in the output is not an issue; reading from invalid address
> >is, since you might not survive to the sequence number check. Again,
> >if p is an address returned by kmalloc(size, ...), dereferencing p + offset
> >is not safe unless offset is less than size.
>
> Yeah, I understand that. As said in my reply to Linus, I will use
> memchr() to see if there is null byte within the specified length.
> If one is found, I will assume the string is not valid and return
> error to the caller.

Umm... Strictly speaking, memchr() behaviour is undefined if the third
argument exceeds the size of object pointed to by the first one. IOW,
it has every right to assume that all characters in the range to be
searched in are safely readable. You can't assume that it will read
them one by one until it hits the one you are searching for. In practice
it's probably almost[1] true for all our implementations of memchr(), but...

[1] reads past the character being searched for are very likely, but they'll
be within the same page, which is safe.

2013-09-05 02:48:16

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On Wed, Sep 4, 2013 at 6:49 PM, Waiman Long <[email protected]> wrote:
>
> So what I am going to do is to use memchr() to locate a null
> byte within the given length. If one is found, the string must be invalid
> and there is no point in doing the copying. Instead, EINVAL will be returned
> and the code will check the sequence number then. The memchr() function can
> be fast if an architecture-specific version exists.

We don't have an architecture-specific fast version of memchr, because
nobody sane has ever cared about that abomination of a function. Also,
if rename() overwrites the pathname (switching inline names), I guess
a zero could disappear between your memchr and the copy. Although I
think we always have an ending NUL byte for the inline case, so that
should make sure that memchr would find it *eventually*.

But regardless, that's really not what you want to do.

You should do:

- use the name length as a maximum

- do a byte-at-a-time copy, stopping at a zero (it's going to be
faster than memchr anyway)

Then, later on, we can do one that does a word-at-a-time using the
CONFIG_DCACHE_WORD_ACCESS magic: we know dentry names are always
word-aligned, and we have an efficient "has_zero()" function for
finding zero bytes in a word.

Really.

Linus

2013-09-05 04:20:07

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On Wed, Sep 04, 2013 at 07:48:14PM -0700, Linus Torvalds wrote:
> - use the name length as a maximum
>
> - do a byte-at-a-time copy, stopping at a zero (it's going to be
> faster than memchr anyway)
>
> Then, later on, we can do one that does a word-at-a-time using the
> CONFIG_DCACHE_WORD_ACCESS magic: we know dentry names are always
> word-aligned, and we have an efficient "has_zero()" function for
> finding zero bytes in a word.

Umm... Dentry names are word-aligned, but the place where we copy them
doesn't have to be. OTOH, DCACHE_WORD_ACCESS is for architectures where
unaligned stores are fine, so...

2013-09-05 04:30:37

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

As long as you're removing locks from prepend_name and complicating its
innards, I notice that each and every call site follows it by prepending
"/". How about moving that into prepend_name as well?

Also, if you happen to feel like it, you can delete the slash flag
and replace it with "bptr != *buffer".

Another small tweak would be to the global_root part of the code.
You could move the is_mounted(vfsmnt) test up, and combine the tail of
that code path with the regular exit. All you have to do is change
the !slash test to:

if (error >= 0 && bptr == *buffer) { /* Root directory */
if (--blen < 0)
error = -ENAMETOOLONG;
else
*--bptr = '/';
}

This modified form is no more code than an inlined copy of prepend(),
so we haven't actually slowed the fast path, but it avoids corrupting
the return value of 0/1/2 if possible.

2013-09-05 13:29:45

by John Stoffel

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

>>>>> "Waiman" == Waiman Long <[email protected]> writes:

Waiman> On 09/04/2013 04:40 PM, John Stoffel wrote:
>>>>>>> "Waiman" == Waiman Long<[email protected]> writes:
Waiman> In term of AIM7 performance, this patch has a performance boost of
Waiman> about 6-7% on top of Linus' lockref patch on a 8-socket 80-core DL980.
>>
Waiman> User Range | 10-100 | 200-10000 | 1100-2000 |
Waiman> Mean JPM w/o patch | 4,365,114 | 7,211,115 | 6,964,439 |
Waiman> Mean JPM with patch | 3,872,850 | 7,655,958 | 7,422,598 |
Waiman> % Change | -11.3% | +6.2% | +6.6% |
>>
>> This -11% impact is worisome to me, because at smaller numbers of
>> users, I would still expect the performance to go up. So why the big
>> drop?
>>
>> Also, how is the impact of these changes on smaller 1 socket, 4 core
>> systems? Just because it helps a couple of big boxes, doesn't mean it
>> won't hurt the more common small case.
>>
>> John

Waiman> I don't believe the patch will make it slower with less
Waiman> user. It is more a result of run-to-run variation. The short
Waiman> workload typically completed in a very short time. In the
Waiman> 10-100 user range, the completion times range from
Waiman> 0.02-0.11s. With a higher user count, it needs several seconds
Waiman> to run and hence the results are more reliable.

Can you then show the variation over multiple runs? I think you have
a good justification for larger boxes to make this change, I just
worry about smaller systems getting hit and losing performance.

John

2013-09-05 17:06:50

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/05/2013 12:30 AM, George Spelvin wrote:
> As long as you're removing locks from prepend_name and complicating its
> innards, I notice that each and every call site follows it by prepending
> "/". How about moving that into prepend_name as well?
>
> Also, if you happen to feel like it, you can delete the slash flag
> and replace it with "bptr != *buffer".
>
> Another small tweak would be to the global_root part of the code.
> You could move the is_mounted(vfsmnt) test up, and combine the tail of
> that code path with the regular exit. All you have to do is change
> the !slash test to:
>
> if (error>= 0&& bptr == *buffer) { /* Root directory */
> if (--blen< 0)
> error = -ENAMETOOLONG;
> else
> *--bptr = '/';
> }
>
> This modified form is no more code than an inlined copy of prepend(),
> so we haven't actually slowed the fast path, but it avoids corrupting
> the return value of 0/1/2 if possible.

Thank for the suggestions. I will implement them in my v2 patch.

-Longman

2013-09-05 17:29:08

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] dcache: Translating dentry into pathname without taking rename_lock

On 09/05/2013 09:29 AM, John Stoffel wrote:
> Can you then show the variation over multiple runs? I think you have a
> good justification for larger boxes to make this change, I just worry
> about smaller systems getting hit and losing performance. John

Please see the attached PDF file for the performance level of the short
workload at different number of users. You can see that there is quite a
lot of run-to-run variation.

-Longman

Attachments:

aim7-short-jpm-3.11.pdf (77.78 kB)