2017-07-18 23:27:14

by NeilBrown

[permalink] [raw]
Subject: [PATCH 00/12] Assorted lustre fixes and improvements

Hi,
I've been looking through lustre, finding things, and fixing some of
them.
There are three distinct subsets of patches here.
- 1 patch to fix a few typos in comments
- 2 patches to fix dcache issues, particularly around
DCACHE_DISCONNECTED
- 9 patches to clean up code in ldlm_flock(). These started
as some simple list_entry tidy-ups, and just grew....

The most interesting from a review perspective is
staging: lustre: llite: fix various issues with ll_splice_alias.
there are real locking issues in there, but its hard to be 100%
sure I've fixed them all and not introduced new issues.
It would be *really* nice if we could get rid of the concept of
"invalid" dentries, but I don't yet understand it well enough to
propose and alternative.

Thanks,
NeilBrown

---

NeilBrown (12):
staging: lustre: fix minor typos in comments
staging: lustre: llite: fix incorrect DCACHE_DISCONNECTED test
staging: lustre: llite: fix various issues with ll_splice_alias.
staging: lustre: ldlm: remove 'first_enq' arg from ldlm_process_flock_lock()
staging: lustre: ldlm: remove unused 'work_list' arg from ldlm_process_flock_lock()
staging: lustre: ldlm: remove unneeded 'err' arg to ldlm_process_flock_lock()
staging: lustre: ldlm: remove 'flags' arg from ldlm_process_flock_lock()
staging: lustre: ldlm: remove unused 'overlaps' variable
staging: lustre: ldlm: remove 'flags' arg from ldlm_flock_destroy()
staging: lustre: ldlm: tidy list walking in ldlm_flock()
staging: lustre: ldlm: remove unnecessary 'ownlocks' variable.
staging: lustre: ldlm: remove unused field 'fwd_generation'


drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 208 ++++----------------
.../staging/lustre/lustre/llite/llite_internal.h | 2
drivers/staging/lustre/lustre/llite/namei.c | 71 +++----
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 4
4 files changed, 80 insertions(+), 205 deletions(-)

--
Signature


2017-07-18 23:27:23

by NeilBrown

[permalink] [raw]
Subject: [PATCH 01/12] staging: lustre: fix minor typos in comments

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/llite/namei.c | 2 +-
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index a208a8b02c2c..293a3180ec70 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -490,7 +490,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
*de = alias;

if (!it_disposition(it, DISP_LOOKUP_NEG)) {
- /* we have lookup look - unhide dentry */
+ /* We have the "lookup" lock, so unhide dentry */
if (bits & MDS_INODELOCK_LOOKUP)
d_lustre_revalidate(*de);
} else if (!it_disposition(it, DISP_OPEN_CREATE)) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 3eb66cea65db..ae97c6f1aeb0 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1030,7 +1030,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
* If we're performing a creation, that means that unless the creation
* failed with EEXIST, we should fake up a negative dentry.
*
- * For everything else, we want to lookup to succeed.
+ * For everything else, we want the lookup to succeed.
*
* One additional note: if CREATE or OPEN succeeded, we add an extra
* reference to the request because we need to keep it around until
@@ -1040,7 +1040,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
* exactly what it_status refers to.
*
* If DISP_OPEN_OPEN is set, then it_status refers to the open() call,
- * otherwise if DISP_OPEN_CREATE is set, then it status is the
+ * otherwise if DISP_OPEN_CREATE is set, then it_status is the
* creation failure mode. In either case, one of DISP_LOOKUP_NEG or
* DISP_LOOKUP_POS will be set, indicating whether the child lookup
* was successful.


2017-07-18 23:27:28

by NeilBrown

[permalink] [raw]
Subject: [PATCH 02/12] staging: lustre: llite: fix incorrect DCACHE_DISCONNECTED test

It is almost always wrong to test DCACHE_DISCONNECTED, except in
"exportfs" code.
The flag tells us that this dentry *might* not be connected to the
root through a chain of d_parent links. Following the d_parent
to an IS_ROOT() dentry *might* find one that is on the s_anon list
rather than s_root.

The code here need to know if it is safe to call __d_drop(), and
the correct test is "!IS_ROOT(dentry)".
If a dentry IS_ROOT(), then it might be the filesystem root, or it
might be the root of a DCACHE_DISCONNECTED tree, and so be on
the s_anon list. In these two cases it should not be __d_drop()ed.
If !IS_ROOT(), then the dentry is attached to its parent through
d_subdir, and can safely be unhashed.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/lustre/llite/llite_internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index cd3311abf999..4854985bf4d3 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1299,7 +1299,7 @@ static inline void d_lustre_invalidate(struct dentry *dentry, int nested)
* If we unhashed such a dentry, unmount would not be able to find
* it and busy inodes would be reported.
*/
- if (d_count(dentry) == 0 && !(dentry->d_flags & DCACHE_DISCONNECTED))
+ if (d_count(dentry) == 0 && !IS_ROOT(dentry))
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
}


2017-07-18 23:27:38

by NeilBrown

[permalink] [raw]
Subject: [PATCH 03/12] staging: lustre: llite: fix various issues with ll_splice_alias.

1/ The testing of DCACHE_DISCONNECTED is wrong.
see upstream commit da093a9b76ef ("dcache: d_splice_alias should
ignore DCACHE_DISCONNECTED")

As this is a notoriously difficult piece of code to get right,
it makes sense to use d_splice_alias() directly and no try to
create a local version of it.

2/ ll_find_alias() currently:
locks and alias
checks that it is the one we want
unlock it
locks it again
gets a reference
unlocks it

This isn't safe. Anything could happen to the dentry while we
don't hold a reference. We need to dget the reference while
still holding the lock.

3/ The d_move() in ll_splice_alias() is pointless. We have
already checked the hash, name, and parent are the same, and
these are the only fields that d_move() will change.

4/ The call to d_add() is outside of any locking. This makes it
possible for two identical dentries to be added to the same
inode, which would cause confusion.

Prior to 4.7, i_mutex would have provided exclusion, but since
the VFS supports parallel lookups, only a shared lock is held
on i_mutex.

Because ll_d_init() creates a dentry in a state where
ll_dcompare will no recognize it, the VFS provides no guarantee
that we won't have two concurrent calls to ll_lookup_dn() for
the same parent/name.


So: rename ll_find_alias() to ll_find_invalid_alias() and have it
just focus on finding an invalid alias.

For directories, we can just use d__splice_alias() directly.
There must only be one alias for a directory, and
ll_splice_alias() will find it where it is "invalid" or not.

For non-directories, we call ll_find_invalid_alias(), and either
use the result or call d_add(). We need a lock to protect from
races with other threads calling ll_find_invalid_alias() and
d_add() at the same time. lli_lock seems suitable for this
purpose.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/llite/namei.c | 69 +++++++++++++--------------
1 file changed, 34 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 293a3180ec70..6204c3e70d45 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -378,75 +378,74 @@ void ll_i2gids(__u32 *suppgids, struct inode *i1, struct inode *i2)
}

/*
- * try to reuse three types of dentry:
- * 1. unhashed alias, this one is unhashed by d_invalidate (but it may be valid
- * by concurrent .revalidate).
- * 2. INVALID alias (common case for no valid ldlm lock held, but this flag may
- * be cleared by others calling d_lustre_revalidate).
- * 3. DISCONNECTED alias.
+ * Try to find an "invalid" alias. i.e. one that was unhashed by
+ * d_invalidate(), or that was instantiated with no valid ldlm lock.
+ * These can be rehased by d_lustre_revalidate(), which could race
+ * with this code.
*/
-static struct dentry *ll_find_alias(struct inode *inode, struct dentry *dentry)
+static struct dentry *ll_find_invalid_alias(struct inode *inode,
+ struct dentry *dentry)
{
- struct dentry *alias, *discon_alias, *invalid_alias;
+ struct dentry *alias, *invalid_alias = NULL;

if (hlist_empty(&inode->i_dentry))
return NULL;

- discon_alias = NULL;
- invalid_alias = NULL;
-
spin_lock(&inode->i_lock);
hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
LASSERT(alias != dentry);

spin_lock(&alias->d_lock);
- if ((alias->d_flags & DCACHE_DISCONNECTED) &&
- S_ISDIR(inode->i_mode))
- /* LASSERT(last_discon == NULL); LU-405, bz 20055 */
- discon_alias = alias;
- else if (alias->d_parent == dentry->d_parent &&
- alias->d_name.hash == dentry->d_name.hash &&
- alias->d_name.len == dentry->d_name.len &&
- memcmp(alias->d_name.name, dentry->d_name.name,
- dentry->d_name.len) == 0)
+ if (alias->d_parent == dentry->d_parent &&
+ alias->d_name.hash == dentry->d_name.hash &&
+ alias->d_name.len == dentry->d_name.len &&
+ memcmp(alias->d_name.name, dentry->d_name.name,
+ dentry->d_name.len) == 0) {
+ dget_dlock(alias);
invalid_alias = alias;
+ }
spin_unlock(&alias->d_lock);

if (invalid_alias)
break;
}
- alias = invalid_alias ?: discon_alias ?: NULL;
- if (alias) {
- spin_lock(&alias->d_lock);
- dget_dlock(alias);
- spin_unlock(&alias->d_lock);
- }
spin_unlock(&inode->i_lock);

- return alias;
+ return invalid_alias;
}

/*
- * Similar to d_splice_alias(), but lustre treats invalid alias
- * similar to DCACHE_DISCONNECTED, and tries to use it anyway.
+ * Similar to d_splice_alias(), but also look for an "invalid" alias,
+ * specific to lustre, and use that if found.
*/
struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
{
- if (inode) {
- struct dentry *new = ll_find_alias(inode, de);
+ if (inode && !S_ISDIR(inode->i_mode)) {
+ struct ll_inode_info *lli = ll_i2info(inode);
+ struct dentry *new;
+
+ /* We need lli_lock here as another thread could
+ * be running this code, and i_lock cannot protect us.
+ */
+ spin_lock(&lli->lli_lock);
+ new = ll_find_invalid_alias(inode, de);
+ if (!new)
+ d_add(de, inode);
+ spin_lock(&lli->lli_lock);

if (new) {
- d_move(new, de);
iput(inode);
CDEBUG(D_DENTRY,
"Reuse dentry %p inode %p refc %d flags %#x\n",
new, d_inode(new), d_count(new), new->d_flags);
return new;
}
+ return de;
}
- d_add(de, inode);
- CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
- de, d_inode(de), d_count(de), de->d_flags);
+ de = d_splice_alias(inode, de);
+ if (!IS_ERR(de))
+ CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
+ de, d_inode(de), d_count(de), de->d_flags);
return de;
}



2017-07-18 23:27:43

by NeilBrown

[permalink] [raw]
Subject: [PATCH 04/12] staging: lustre: ldlm: remove 'first_enq' arg from ldlm_process_flock_lock()

it is only ever set to '1', so we can just assume that and remove the code.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index b7f28b39c7b3..8ba3eaf49c65 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -121,15 +121,9 @@ ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
* It is also responsible for splitting a lock if a portion of the lock
* is released.
*
- * If \a first_enq is 0 (ie, called from ldlm_reprocess_queue):
- * - blocking ASTs have already been sent
- *
- * If \a first_enq is 1 (ie, called from ldlm_lock_enqueue):
- * - blocking ASTs have not been sent yet, so list of conflicting locks
- * would be collected and ASTs sent.
*/
static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,
- int first_enq, enum ldlm_error *err,
+ enum ldlm_error *err,
struct list_head *work_list)
{
struct ldlm_resource *res = req->l_resource;
@@ -197,11 +191,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,
if (!ldlm_flocks_overlap(lock, req))
continue;

- if (!first_enq) {
- reprocess_failed = 1;
- continue;
- }
-
if (*flags & LDLM_FL_BLOCK_NOWAIT) {
ldlm_flock_destroy(req, mode, *flags);
*err = -EAGAIN;
@@ -605,7 +594,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
/* We need to reprocess the lock to do merges or splits
* with existing locks owned by this process.
*/
- ldlm_process_flock_lock(lock, &noreproc, 1, &err, NULL);
+ ldlm_process_flock_lock(lock, &noreproc, &err, NULL);
}
unlock_res_and_lock(lock);
return rc;


2017-07-18 23:27:49

by NeilBrown

[permalink] [raw]
Subject: [PATCH 05/12] staging: lustre: ldlm: remove unused 'work_list' arg from ldlm_process_flock_lock()

'work_list' is only set to NULL, and is never used.
So discard it.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 8ba3eaf49c65..dce81fbf3049 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -123,8 +123,7 @@ ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
*
*/
static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,
- enum ldlm_error *err,
- struct list_head *work_list)
+ enum ldlm_error *err)
{
struct ldlm_resource *res = req->l_resource;
struct ldlm_namespace *ns = ldlm_res_to_ns(res);
@@ -594,7 +593,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
/* We need to reprocess the lock to do merges or splits
* with existing locks owned by this process.
*/
- ldlm_process_flock_lock(lock, &noreproc, &err, NULL);
+ ldlm_process_flock_lock(lock, &noreproc, &err);
}
unlock_res_and_lock(lock);
return rc;


2017-07-18 23:27:57

by NeilBrown

[permalink] [raw]
Subject: [PATCH 06/12] staging: lustre: ldlm: remove unneeded 'err' arg to ldlm_process_flock_lock()

This arg is used to return an error code, but the returned code is never
looked at. So there is no point returning it.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index dce81fbf3049..fbb627a2f6c1 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -122,8 +122,7 @@ ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
* is released.
*
*/
-static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,
- enum ldlm_error *err)
+static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
{
struct ldlm_resource *res = req->l_resource;
struct ldlm_namespace *ns = ldlm_res_to_ns(res);
@@ -145,8 +144,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,
req->l_policy_data.l_flock.start,
req->l_policy_data.l_flock.end);

- *err = ELDLM_OK;
-
/* No blocking ASTs are sent to the clients for
* Posix file & record locks
*/
@@ -192,7 +189,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,

if (*flags & LDLM_FL_BLOCK_NOWAIT) {
ldlm_flock_destroy(req, mode, *flags);
- *err = -EAGAIN;
return LDLM_ITER_STOP;
}

@@ -330,7 +326,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags,
if (IS_ERR(new2)) {
ldlm_flock_destroy(req, lock->l_granted_mode,
*flags);
- *err = PTR_ERR(new2);
return LDLM_ITER_STOP;
}
goto reprocess;
@@ -440,7 +435,6 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
struct obd_import *imp = NULL;
struct ldlm_flock_wait_data fwd;
struct l_wait_info lwi;
- enum ldlm_error err;
int rc = 0;

OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT2, 4);
@@ -593,7 +587,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
/* We need to reprocess the lock to do merges or splits
* with existing locks owned by this process.
*/
- ldlm_process_flock_lock(lock, &noreproc, &err);
+ ldlm_process_flock_lock(lock, &noreproc);
}
unlock_res_and_lock(lock);
return rc;


2017-07-18 23:28:02

by NeilBrown

[permalink] [raw]
Subject: [PATCH 07/12] staging: lustre: ldlm: remove 'flags' arg from ldlm_process_flock_lock()

This is only ever set to LDLM_FL_WAIT_NOREPROC, so we can remove the arg
and discard any code that is only run when it doesn't have that value.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 110 ++++-------------------
1 file changed, 21 insertions(+), 89 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index fbb627a2f6c1..b6d5a83e61cd 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -122,7 +122,7 @@ ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
* is released.
*
*/
-static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
+static int ldlm_process_flock_lock(struct ldlm_lock *req)
{
struct ldlm_resource *res = req->l_resource;
struct ldlm_namespace *ns = ldlm_res_to_ns(res);
@@ -138,8 +138,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
const struct ldlm_callback_suite null_cbs = { };

CDEBUG(D_DLMTRACE,
- "flags %#llx owner %llu pid %u mode %u start %llu end %llu\n",
- *flags, new->l_policy_data.l_flock.owner,
+ "owner %llu pid %u mode %u start %llu end %llu\n",
+ new->l_policy_data.l_flock.owner,
new->l_policy_data.l_flock.pid, mode,
req->l_policy_data.l_flock.start,
req->l_policy_data.l_flock.end);
@@ -150,74 +150,16 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
req->l_blocking_ast = NULL;

reprocess:
- if ((*flags == LDLM_FL_WAIT_NOREPROC) || (mode == LCK_NL)) {
- /* This loop determines where this processes locks start
- * in the resource lr_granted list.
- */
- list_for_each(tmp, &res->lr_granted) {
- lock = list_entry(tmp, struct ldlm_lock,
- l_res_link);
- if (ldlm_same_flock_owner(lock, req)) {
- ownlocks = tmp;
- break;
- }
- }
- } else {
- int reprocess_failed = 0;
-
- lockmode_verify(mode);
-
- /* This loop determines if there are existing locks
- * that conflict with the new lock request.
- */
- list_for_each(tmp, &res->lr_granted) {
- lock = list_entry(tmp, struct ldlm_lock,
- l_res_link);
-
- if (ldlm_same_flock_owner(lock, req)) {
- if (!ownlocks)
- ownlocks = tmp;
- continue;
- }
-
- /* locks are compatible, overlap doesn't matter */
- if (lockmode_compat(lock->l_granted_mode, mode))
- continue;
-
- if (!ldlm_flocks_overlap(lock, req))
- continue;
-
- if (*flags & LDLM_FL_BLOCK_NOWAIT) {
- ldlm_flock_destroy(req, mode, *flags);
- return LDLM_ITER_STOP;
- }
-
- if (*flags & LDLM_FL_TEST_LOCK) {
- ldlm_flock_destroy(req, mode, *flags);
- req->l_req_mode = lock->l_granted_mode;
- req->l_policy_data.l_flock.pid =
- lock->l_policy_data.l_flock.pid;
- req->l_policy_data.l_flock.start =
- lock->l_policy_data.l_flock.start;
- req->l_policy_data.l_flock.end =
- lock->l_policy_data.l_flock.end;
- *flags |= LDLM_FL_LOCK_CHANGED;
- return LDLM_ITER_STOP;
- }
-
- ldlm_resource_add_lock(res, &res->lr_waiting, req);
- *flags |= LDLM_FL_BLOCK_GRANTED;
- return LDLM_ITER_STOP;
+ /* This loop determines where this processes locks start
+ * in the resource lr_granted list.
+ */
+ list_for_each(tmp, &res->lr_granted) {
+ lock = list_entry(tmp, struct ldlm_lock,
+ l_res_link);
+ if (ldlm_same_flock_owner(lock, req)) {
+ ownlocks = tmp;
+ break;
}
- if (reprocess_failed)
- return LDLM_ITER_CONTINUE;
- }
-
- if (*flags & LDLM_FL_TEST_LOCK) {
- ldlm_flock_destroy(req, mode, *flags);
- req->l_req_mode = LCK_NL;
- *flags |= LDLM_FL_LOCK_CHANGED;
- return LDLM_ITER_STOP;
}

/* Scan the locks owned by this process that overlap this request.
@@ -267,7 +209,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
}

if (added) {
- ldlm_flock_destroy(lock, mode, *flags);
+ ldlm_flock_destroy(lock, mode,
+ LDLM_FL_WAIT_NOREPROC);
} else {
new = lock;
added = 1;
@@ -293,7 +236,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
new->l_policy_data.l_flock.end + 1;
break;
}
- ldlm_flock_destroy(lock, lock->l_req_mode, *flags);
+ ldlm_flock_destroy(lock, lock->l_req_mode,
+ LDLM_FL_WAIT_NOREPROC);
continue;
}
if (new->l_policy_data.l_flock.end >=
@@ -325,7 +269,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
lock_res_and_lock(req);
if (IS_ERR(new2)) {
ldlm_flock_destroy(req, lock->l_granted_mode,
- *flags);
+ LDLM_FL_WAIT_NOREPROC);
return LDLM_ITER_STOP;
}
goto reprocess;
@@ -354,9 +298,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
&new2->l_remote_handle,
&new2->l_exp_hash);
}
- if (*flags == LDLM_FL_WAIT_NOREPROC)
- ldlm_lock_addref_internal_nolock(new2,
- lock->l_granted_mode);
+ ldlm_lock_addref_internal_nolock(new2,
+ lock->l_granted_mode);

/* insert new2 at lock */
ldlm_resource_add_lock(res, ownlocks, new2);
@@ -377,22 +320,13 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req, __u64 *flags)
ldlm_resource_add_lock(res, ownlocks, req);
}

- if (*flags != LDLM_FL_WAIT_NOREPROC) {
- /* The only one possible case for client-side calls flock
- * policy function is ldlm_flock_completion_ast inside which
- * carries LDLM_FL_WAIT_NOREPROC flag.
- */
- CERROR("Illegal parameter for client-side-only module.\n");
- LBUG();
- }
-
/* In case we're reprocessing the requested lock we can't destroy
* it until after calling ldlm_add_ast_work_item() above so that laawi()
* can bump the reference count on \a req. Otherwise \a req
* could be freed before the completion AST can be sent.
*/
if (added)
- ldlm_flock_destroy(req, mode, *flags);
+ ldlm_flock_destroy(req, mode, LDLM_FL_WAIT_NOREPROC);

ldlm_resource_dump(D_INFO, res);
return LDLM_ITER_CONTINUE;
@@ -582,12 +516,10 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
getlk->fl_start = (loff_t)lock->l_policy_data.l_flock.start;
getlk->fl_end = (loff_t)lock->l_policy_data.l_flock.end;
} else {
- __u64 noreproc = LDLM_FL_WAIT_NOREPROC;
-
/* We need to reprocess the lock to do merges or splits
* with existing locks owned by this process.
*/
- ldlm_process_flock_lock(lock, &noreproc);
+ ldlm_process_flock_lock(lock);
}
unlock_res_and_lock(lock);
return rc;


2017-07-18 23:28:09

by NeilBrown

[permalink] [raw]
Subject: [PATCH 08/12] staging: lustre: ldlm: remove unused 'overlaps' variable

'overlaps' is never used, only incremented.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index b6d5a83e61cd..0fb2c882ab4a 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -133,7 +133,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
struct ldlm_lock *new2 = NULL;
enum ldlm_mode mode = req->l_req_mode;
int added = (mode == LCK_NL);
- int overlaps = 0;
int splitted = 0;
const struct ldlm_callback_suite null_cbs = { };

@@ -226,8 +225,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
lock->l_policy_data.l_flock.start)
break;

- ++overlaps;
-
if (new->l_policy_data.l_flock.start <=
lock->l_policy_data.l_flock.start) {
if (new->l_policy_data.l_flock.end <


2017-07-18 23:28:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 09/12] staging: lustre: ldlm: remove 'flags' arg from ldlm_flock_destroy()

The only value ever passed in LDLM_FL_WAIT_NOREPROC, so assume that
instead of passing it.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 36 ++++++++++-------------
1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 0fb2c882ab4a..9a888e1ce923 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -88,24 +88,23 @@ ldlm_flocks_overlap(struct ldlm_lock *lock, struct ldlm_lock *new)
}

static inline void
-ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
+ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode)
{
- LDLM_DEBUG(lock, "ldlm_flock_destroy(mode: %d, flags: 0x%llx)",
- mode, flags);
+ LDLM_DEBUG(lock, "ldlm_flock_destroy(mode: %d)",
+ mode);

/* Safe to not lock here, since it should be empty anyway */
LASSERT(hlist_unhashed(&lock->l_exp_flock_hash));

list_del_init(&lock->l_res_link);
- if (flags == LDLM_FL_WAIT_NOREPROC) {
- /* client side - set a flag to prevent sending a CANCEL */
- lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;

- /* when reaching here, it is under lock_res_and_lock(). Thus,
- * need call the nolock version of ldlm_lock_decref_internal
- */
- ldlm_lock_decref_internal_nolock(lock, mode);
- }
+ /* client side - set a flag to prevent sending a CANCEL */
+ lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
+
+ /* when reaching here, it is under lock_res_and_lock(). Thus,
+ * need call the nolock version of ldlm_lock_decref_internal
+ */
+ ldlm_lock_decref_internal_nolock(lock, mode);

ldlm_lock_destroy_nolock(lock);
}
@@ -208,8 +207,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
}

if (added) {
- ldlm_flock_destroy(lock, mode,
- LDLM_FL_WAIT_NOREPROC);
+ ldlm_flock_destroy(lock, mode);
} else {
new = lock;
added = 1;
@@ -233,8 +231,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
new->l_policy_data.l_flock.end + 1;
break;
}
- ldlm_flock_destroy(lock, lock->l_req_mode,
- LDLM_FL_WAIT_NOREPROC);
+ ldlm_flock_destroy(lock, lock->l_req_mode);
continue;
}
if (new->l_policy_data.l_flock.end >=
@@ -265,8 +262,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
NULL, 0, LVB_T_NONE);
lock_res_and_lock(req);
if (IS_ERR(new2)) {
- ldlm_flock_destroy(req, lock->l_granted_mode,
- LDLM_FL_WAIT_NOREPROC);
+ ldlm_flock_destroy(req, lock->l_granted_mode);
return LDLM_ITER_STOP;
}
goto reprocess;
@@ -323,7 +319,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
* could be freed before the completion AST can be sent.
*/
if (added)
- ldlm_flock_destroy(req, mode, LDLM_FL_WAIT_NOREPROC);
+ ldlm_flock_destroy(req, mode);

ldlm_resource_dump(D_INFO, res);
return LDLM_ITER_CONTINUE;
@@ -477,7 +473,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
LDLM_DEBUG(lock, "client-side enqueue deadlock received");
rc = -EDEADLK;
}
- ldlm_flock_destroy(lock, mode, LDLM_FL_WAIT_NOREPROC);
+ ldlm_flock_destroy(lock, mode);
unlock_res_and_lock(lock);

/* Need to wake up the waiter if we were evicted */
@@ -498,7 +494,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
* in the lock changes we can decref the appropriate refcount.
*/
LASSERT(ldlm_is_test_lock(lock));
- ldlm_flock_destroy(lock, getlk->fl_type, LDLM_FL_WAIT_NOREPROC);
+ ldlm_flock_destroy(lock, getlk->fl_type);
switch (lock->l_granted_mode) {
case LCK_PR:
getlk->fl_type = F_RDLCK;


2017-07-18 23:28:27

by NeilBrown

[permalink] [raw]
Subject: [PATCH 10/12] staging: lustre: ldlm: tidy list walking in ldlm_flock()

Use list_for_each_entry variants to
avoid the explicit list_entry() calls.
This allows us to use list_for_each_entry_safe_from()
instread of adding a local list-walking macro.

Also improve some comments so that it is more obvious
that the locks are sorted per-owner and that we need
to find the insertion point.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 45 ++++++++++-------------
1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 9a888e1ce923..58227728a002 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -59,17 +59,6 @@
#include <linux/list.h>
#include "ldlm_internal.h"

-/**
- * list_for_remaining_safe - iterate over the remaining entries in a list
- * and safeguard against removal of a list entry.
- * \param pos the &struct list_head to use as a loop counter. pos MUST
- * have been initialized prior to using it in this macro.
- * \param n another &struct list_head to use as temporary storage
- * \param head the head for your list.
- */
-#define list_for_remaining_safe(pos, n, head) \
- for (n = pos->next; pos != (head); pos = n, n = pos->next)
-
static inline int
ldlm_same_flock_owner(struct ldlm_lock *lock, struct ldlm_lock *new)
{
@@ -125,8 +114,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
{
struct ldlm_resource *res = req->l_resource;
struct ldlm_namespace *ns = ldlm_res_to_ns(res);
- struct list_head *tmp;
- struct list_head *ownlocks = NULL;
+ struct ldlm_lock *tmp;
+ struct ldlm_lock *ownlocks = NULL;
struct ldlm_lock *lock = NULL;
struct ldlm_lock *new = req;
struct ldlm_lock *new2 = NULL;
@@ -151,23 +140,23 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
/* This loop determines where this processes locks start
* in the resource lr_granted list.
*/
- list_for_each(tmp, &res->lr_granted) {
- lock = list_entry(tmp, struct ldlm_lock,
- l_res_link);
+ list_for_each_entry(lock, &res->lr_granted, l_res_link) {
if (ldlm_same_flock_owner(lock, req)) {
- ownlocks = tmp;
+ ownlocks = lock;
break;
}
}

- /* Scan the locks owned by this process that overlap this request.
+ /* Scan the locks owned by this process to find the insertion point
+ * (as locks are ordered), and to handle overlaps.
* We may have to merge or split existing locks.
*/
- if (!ownlocks)
- ownlocks = &res->lr_granted;
-
- list_for_remaining_safe(ownlocks, tmp, &res->lr_granted) {
- lock = list_entry(ownlocks, struct ldlm_lock, l_res_link);
+ if (ownlocks)
+ lock = ownlocks;
+ else
+ lock = list_entry(&res->lr_granted,
+ struct ldlm_lock, l_res_link);
+ list_for_each_entry_safe_from(lock, tmp, &res->lr_granted, l_res_link) {

if (!ldlm_same_flock_owner(lock, new))
break;
@@ -295,7 +284,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
lock->l_granted_mode);

/* insert new2 at lock */
- ldlm_resource_add_lock(res, ownlocks, new2);
+ ldlm_resource_add_lock(res, &lock->l_res_link, new2);
LDLM_LOCK_RELEASE(new2);
break;
}
@@ -309,8 +298,12 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)

if (!added) {
list_del_init(&req->l_res_link);
- /* insert new lock before ownlocks in list. */
- ldlm_resource_add_lock(res, ownlocks, req);
+ /* insert new lock before "lock", which might be
+ * the next lock for this owner, or might be the first
+ * lock for the next owner, order might not be a lock
+ * at all, but instead points at the head of the list
+ */
+ ldlm_resource_add_lock(res, &lock->l_res_link, req);
}

/* In case we're reprocessing the requested lock we can't destroy


2017-07-18 23:28:34

by NeilBrown

[permalink] [raw]
Subject: [PATCH 11/12] staging: lustre: ldlm: remove unnecessary 'ownlocks' variable.

Now that the code has been simplified, 'ownlocks' is not
necessary.

The loop which sets it exits with 'lock' having the same value as
'ownlocks', or point to the head of the list if ownlocks is NULL.

The current code then tests ownlocks and sets 'lock' to exact the
value that it currently has.

So discard 'ownlocks'.

Also remove unnecessary initialization of 'lock'.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 58227728a002..4e8808103437 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -115,8 +115,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
struct ldlm_resource *res = req->l_resource;
struct ldlm_namespace *ns = ldlm_res_to_ns(res);
struct ldlm_lock *tmp;
- struct ldlm_lock *ownlocks = NULL;
- struct ldlm_lock *lock = NULL;
+ struct ldlm_lock *lock;
struct ldlm_lock *new = req;
struct ldlm_lock *new2 = NULL;
enum ldlm_mode mode = req->l_req_mode;
@@ -140,22 +139,14 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
/* This loop determines where this processes locks start
* in the resource lr_granted list.
*/
- list_for_each_entry(lock, &res->lr_granted, l_res_link) {
- if (ldlm_same_flock_owner(lock, req)) {
- ownlocks = lock;
+ list_for_each_entry(lock, &res->lr_granted, l_res_link)
+ if (ldlm_same_flock_owner(lock, req))
break;
- }
- }

/* Scan the locks owned by this process to find the insertion point
* (as locks are ordered), and to handle overlaps.
* We may have to merge or split existing locks.
*/
- if (ownlocks)
- lock = ownlocks;
- else
- lock = list_entry(&res->lr_granted,
- struct ldlm_lock, l_res_link);
list_for_each_entry_safe_from(lock, tmp, &res->lr_granted, l_res_link) {

if (!ldlm_same_flock_owner(lock, new))


2017-07-18 23:28:41

by NeilBrown

[permalink] [raw]
Subject: [PATCH 12/12] staging: lustre: ldlm: remove unused field 'fwd_generation'

With this field gone, we don't need local variables 'imp' or 'obd'
any more.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 21 +++------------------
1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 4e8808103437..b72a195d9635 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -311,7 +311,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)

struct ldlm_flock_wait_data {
struct ldlm_lock *fwd_lock;
- int fwd_generation;
};

static void
@@ -342,11 +341,9 @@ int
ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
{
struct file_lock *getlk = lock->l_ast_data;
- struct obd_device *obd;
- struct obd_import *imp = NULL;
- struct ldlm_flock_wait_data fwd;
- struct l_wait_info lwi;
- int rc = 0;
+ struct ldlm_flock_wait_data fwd;
+ struct l_wait_info lwi;
+ int rc = 0;

OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT2, 4);
if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT3)) {
@@ -374,18 +371,6 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)

LDLM_DEBUG(lock, "client-side enqueue returned a blocked lock, sleeping");
fwd.fwd_lock = lock;
- obd = class_exp2obd(lock->l_conn_export);
-
- /* if this is a local lock, there is no import */
- if (obd)
- imp = obd->u.cli.cl_import;
-
- if (imp) {
- spin_lock(&imp->imp_lock);
- fwd.fwd_generation = imp->imp_generation;
- spin_unlock(&imp->imp_lock);
- }
-
lwi = LWI_TIMEOUT_INTR(0, NULL, ldlm_flock_interrupted_wait, &fwd);

/* Go to sleep until the lock is granted. */


2017-07-19 02:51:41

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH 03/12] staging: lustre: llite: fix various issues with ll_splice_alias.

Unfortunately this patch causes insta-crash on first stat call after mount.
Sorry, I cannot dig into this deeper right this moment, but I will a bit later.
I am adding Al that we discussed this code at some length and he found no problems
here, so I am a bit surprised by your findings.
Also the reason we reinvent the d_splice_alias is because we need to
splice not just directories, but also regular files.

I also am less sure by your previous DCACHE_DISCONECTED patch that we in fact might
still need, I just need to dig up a test case for that.

Thanks for looking into it!

[ 170.000858] Lustre: Mounted lustre-client
[ 172.799813] Lustre: DEBUG MARKER: Using TIMEOUT=20
[ 186.627954] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 186.628742] IP: __lock_acquire+0x125/0x1370
[ 186.629137] PGD 0
[ 186.629138] P4D 0
[ 186.629496]
[ 186.630216] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 186.630613] Modules linked in: osc(C) mgc(C) lustre(C) lmv(C) fld(C) mdc(C) fid(C) lov(C) ksocklnd(C) ptlrpc(C) obdclass(C) lnet(C) sha512_ssse3 sha512_generic crc32_generic libcfs(C) joydev pcspkr i2c_piix4 virtio_console rpcsec_gss_krb5 ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw virtio_blk floppy
[ 186.633238] CPU: 2 PID: 10897 Comm: sh Tainted: G C 4.13.0-rc1-vm-nfs+ #154
[ 186.633990] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 186.634402] task: ffff8800d6a1c180 task.stack: ffffc900066a0000
[ 186.634807] RIP: 0010:__lock_acquire+0x125/0x1370
[ 186.635206] RSP: 0018:ffffc900066a3750 EFLAGS: 00010002
[ 186.635597] RAX: 0000000000000046 RBX: 0000000000000001 RCX: 0000000000000000
[ 186.636013] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[ 186.636432] RBP: ffffc900066a3810 R08: ffffffffa05221a9 R09: 0000000000000000
[ 186.636844] R10: 0000000000000000 R11: ffff8800d6a1c180 R12: 0000000000000001
[ 186.637264] R13: 0000000000000000 R14: 0000000000000001 R15: 00000000000000a8
[ 186.637679] FS: 00007fd679eaa700(0000) GS:ffff88011a200000(0000) knlGS:0000000000000000
[ 186.638402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 186.638803] CR2: 00000000000000a8 CR3: 0000000109c0b000 CR4: 00000000000006e0
[ 186.639228] Call Trace:
[ 186.639589] lock_acquire+0xe3/0x1d0
[ 186.639962] ? lock_acquire+0xe3/0x1d0
[ 186.640352] ? ll_lookup_it_finish+0x379/0xca0 [lustre]
[ 186.640747] _raw_spin_lock+0x34/0x70
[ 186.641130] ? ll_lookup_it_finish+0x379/0xca0 [lustre]
[ 186.641529] ll_lookup_it_finish+0x379/0xca0 [lustre]
[ 186.641943] ? req_capsule_server_get+0x15/0x20 [ptlrpc]
[ 186.642368] ? lmv_revalidate_slaves+0x790/0x790 [lmv]
[ 186.642779] ll_lookup_it+0x26d/0x820 [lustre]
[ 186.643175] ll_lookup_nd+0x162/0x1a0 [lustre]
[ 186.643575] lookup_slow+0x132/0x220
[ 186.643947] ? __wake_up+0x23/0x50
[ 186.644322] walk_component+0x1bf/0x350
[ 186.644714] link_path_walk+0x1b8/0x630
[ 186.645097] path_lookupat+0x99/0x220
[ 186.645459] ? __kernel_map_pages+0x131/0x140
[ 186.645830] ? __kernel_map_pages+0x131/0x140
[ 186.646206] filename_lookup+0xb8/0x1a0
[ 186.646575] ? __check_object_size+0xb1/0x1a0
[ 186.646952] ? strncpy_from_user+0x4d/0x160
[ 186.647326] user_path_at_empty+0x36/0x40
[ 186.647694] ? user_path_at_empty+0x36/0x40
[ 186.648068] vfs_statx+0x76/0xe0
[ 186.648429] SYSC_newstat+0x3d/0x70
[ 186.648789] ? trace_hardirqs_on_caller+0xf4/0x190
[ 186.649171] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 186.649547] SyS_newstat+0xe/0x10
[ 186.649906] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 186.650289] RIP: 0033:0x7fd679599475
[ 186.650651] RSP: 002b:00007ffc2e568fc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
[ 186.651350] RAX: ffffffffffffffda RBX: 00007fd679862ae0 RCX: 00007fd679599475
[ 186.651758] RDX: 00007ffc2e568fe0 RSI: 00007ffc2e568fe0 RDI: 000000eec2877730
[ 186.652171] RBP: 00007fd679862ae0 R08: 000000eec2cafcb0 R09: 0000000000000008
[ 186.652579] R10: 000000eec2cafcb0 R11: 0000000000000246 R12: 0000000000000020
[ 186.652982] R13: 000000eec2cc7e10 R14: 000000eec2cb2190 R15: 00007ffc2e5690f8
[ 186.653393] Code: c8 65 48 33 3c 25 28 00 00 00 44 89 e0 0f 85 91 0d 00 00 48 81 c4 90 00 00 00 5b 41 5a 41 5c 41 5d 41 5e 41 5f 5d 49 8d 62 f8 c3 <49> 81 3f 00 90 6f 82 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f
[ 186.654536] RIP: __lock_acquire+0x125/0x1370 RSP: ffffc900066a3750
[ 186.654933] CR2: 00000000000000a8


On Jul 18, 2017, at 7:26 PM, NeilBrown wrote:

> 1/ The testing of DCACHE_DISCONNECTED is wrong.
> see upstream commit da093a9b76ef ("dcache: d_splice_alias should
> ignore DCACHE_DISCONNECTED")
>
> As this is a notoriously difficult piece of code to get right,
> it makes sense to use d_splice_alias() directly and no try to
> create a local version of it.
>
> 2/ ll_find_alias() currently:
> locks and alias
> checks that it is the one we want
> unlock it
> locks it again
> gets a reference
> unlocks it
>
> This isn't safe. Anything could happen to the dentry while we
> don't hold a reference. We need to dget the reference while
> still holding the lock.
>
> 3/ The d_move() in ll_splice_alias() is pointless. We have
> already checked the hash, name, and parent are the same, and
> these are the only fields that d_move() will change.
>
> 4/ The call to d_add() is outside of any locking. This makes it
> possible for two identical dentries to be added to the same
> inode, which would cause confusion.
>
> Prior to 4.7, i_mutex would have provided exclusion, but since
> the VFS supports parallel lookups, only a shared lock is held
> on i_mutex.
>
> Because ll_d_init() creates a dentry in a state where
> ll_dcompare will no recognize it, the VFS provides no guarantee
> that we won't have two concurrent calls to ll_lookup_dn() for
> the same parent/name.
>
>
> So: rename ll_find_alias() to ll_find_invalid_alias() and have it
> just focus on finding an invalid alias.
>
> For directories, we can just use d__splice_alias() directly.
> There must only be one alias for a directory, and
> ll_splice_alias() will find it where it is "invalid" or not.
>
> For non-directories, we call ll_find_invalid_alias(), and either
> use the result or call d_add(). We need a lock to protect from
> races with other threads calling ll_find_invalid_alias() and
> d_add() at the same time. lli_lock seems suitable for this
> purpose.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lustre/llite/namei.c | 69 +++++++++++++--------------
> 1 file changed, 34 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
> index 293a3180ec70..6204c3e70d45 100644
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -378,75 +378,74 @@ void ll_i2gids(__u32 *suppgids, struct inode *i1, struct inode *i2)
> }
>
> /*
> - * try to reuse three types of dentry:
> - * 1. unhashed alias, this one is unhashed by d_invalidate (but it may be valid
> - * by concurrent .revalidate).
> - * 2. INVALID alias (common case for no valid ldlm lock held, but this flag may
> - * be cleared by others calling d_lustre_revalidate).
> - * 3. DISCONNECTED alias.
> + * Try to find an "invalid" alias. i.e. one that was unhashed by
> + * d_invalidate(), or that was instantiated with no valid ldlm lock.
> + * These can be rehased by d_lustre_revalidate(), which could race
> + * with this code.
> */
> -static struct dentry *ll_find_alias(struct inode *inode, struct dentry *dentry)
> +static struct dentry *ll_find_invalid_alias(struct inode *inode,
> + struct dentry *dentry)
> {
> - struct dentry *alias, *discon_alias, *invalid_alias;
> + struct dentry *alias, *invalid_alias = NULL;
>
> if (hlist_empty(&inode->i_dentry))
> return NULL;
>
> - discon_alias = NULL;
> - invalid_alias = NULL;
> -
> spin_lock(&inode->i_lock);
> hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
> LASSERT(alias != dentry);
>
> spin_lock(&alias->d_lock);
> - if ((alias->d_flags & DCACHE_DISCONNECTED) &&
> - S_ISDIR(inode->i_mode))
> - /* LASSERT(last_discon == NULL); LU-405, bz 20055 */
> - discon_alias = alias;
> - else if (alias->d_parent == dentry->d_parent &&
> - alias->d_name.hash == dentry->d_name.hash &&
> - alias->d_name.len == dentry->d_name.len &&
> - memcmp(alias->d_name.name, dentry->d_name.name,
> - dentry->d_name.len) == 0)
> + if (alias->d_parent == dentry->d_parent &&
> + alias->d_name.hash == dentry->d_name.hash &&
> + alias->d_name.len == dentry->d_name.len &&
> + memcmp(alias->d_name.name, dentry->d_name.name,
> + dentry->d_name.len) == 0) {
> + dget_dlock(alias);
> invalid_alias = alias;
> + }
> spin_unlock(&alias->d_lock);
>
> if (invalid_alias)
> break;
> }
> - alias = invalid_alias ?: discon_alias ?: NULL;
> - if (alias) {
> - spin_lock(&alias->d_lock);
> - dget_dlock(alias);
> - spin_unlock(&alias->d_lock);
> - }
> spin_unlock(&inode->i_lock);
>
> - return alias;
> + return invalid_alias;
> }
>
> /*
> - * Similar to d_splice_alias(), but lustre treats invalid alias
> - * similar to DCACHE_DISCONNECTED, and tries to use it anyway.
> + * Similar to d_splice_alias(), but also look for an "invalid" alias,
> + * specific to lustre, and use that if found.
> */
> struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
> {
> - if (inode) {
> - struct dentry *new = ll_find_alias(inode, de);
> + if (inode && !S_ISDIR(inode->i_mode)) {
> + struct ll_inode_info *lli = ll_i2info(inode);
> + struct dentry *new;
> +
> + /* We need lli_lock here as another thread could
> + * be running this code, and i_lock cannot protect us.
> + */
> + spin_lock(&lli->lli_lock);
> + new = ll_find_invalid_alias(inode, de);
> + if (!new)
> + d_add(de, inode);
> + spin_lock(&lli->lli_lock);
>
> if (new) {
> - d_move(new, de);
> iput(inode);
> CDEBUG(D_DENTRY,
> "Reuse dentry %p inode %p refc %d flags %#x\n",
> new, d_inode(new), d_count(new), new->d_flags);
> return new;
> }
> + return de;
> }
> - d_add(de, inode);
> - CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
> - de, d_inode(de), d_count(de), de->d_flags);
> + de = d_splice_alias(inode, de);
> + if (!IS_ERR(de))
> + CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
> + de, d_inode(de), d_count(de), de->d_flags);
> return de;
> }
>
>

2017-07-19 04:33:45

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 03/12] staging: lustre: llite: fix various issues with ll_splice_alias.

On Tue, Jul 18 2017, Oleg Drokin wrote:

> Unfortunately this patch causes insta-crash on first stat call after mount.
> Sorry, I cannot dig into this deeper right this moment, but I will a bit later.

V.strange. The crash suggests that the lock, and hence the inode, is
not initialized. I cannot see how that might happen.
though...

>> + spin_lock(&lli->lli_lock);
>> + new = ll_find_invalid_alias(inode, de);
>> + if (!new)
>> + d_add(de, inode);
>> + spin_lock(&lli->lli_lock);

Had it not crashed, it would have deadlocked. That second spin_lock()
should be spin_unlock() :-( I don't *think* that would have caused this crash...

> I am adding Al that we discussed this code at some length and he found no problems
> here, so I am a bit surprised by your findings.

I'd be very happy to read Al's thoughts.

> Also the reason we reinvent the d_splice_alias is because we need to
> splice not just directories, but also regular files.

I see that. A key simplification I bring is that directories and
non-directories can be handled separately. d_splice_alias() does
all we need for directories, and nothing useful for non-dirs.

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-07-19 06:18:56

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH 03/12] staging: lustre: llite: fix various issues with ll_splice_alias.


On Jul 19, 2017, at 12:33 AM, NeilBrown wrote:

> On Tue, Jul 18 2017, Oleg Drokin wrote:
>
>> Unfortunately this patch causes insta-crash on first stat call after mount.
>> Sorry, I cannot dig into this deeper right this moment, but I will a bit later.
>
> V.strange. The crash suggests that the lock, and hence the inode, is
> not initialized. I cannot see how that might happen.
> though...
>
>>> + spin_lock(&lli->lli_lock);
>>> + new = ll_find_invalid_alias(inode, de);
>>> + if (!new)
>>> + d_add(de, inode);
>>> + spin_lock(&lli->lli_lock);
>
> Had it not crashed, it would have deadlocked. That second spin_lock()
> should be spin_unlock() :-( I don't *think* that would have caused this crash?

No, that's not it.
- d_add(de, inode);
- CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
- de, d_inode(de), d_count(de), de->d_flags);
+ de = d_splice_alias(inode, de);
+ if (!IS_ERR(de))
+ CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
+ de, d_inode(de), d_count(de), de->d_flags);
return de;

This is likely it.
d_splice_alias would return NULL if there's no alias, after d_add of the de.
But ll_splice_alias callers expect either an ERR_PTR on error or dentry to use
otherwise.
So in ll_lookup_it_finish() we have:
alias = ll_splice_alias(inode, *de);
if (IS_ERR(alias)) {
rc = PTR_ERR(alias);
goto out;
}
*de = alias;
?
if (md_revalidate_lock(ll_i2mdexp(parent), &parent_it, &fid,
NULL)) {
===>>> whoops! d_lustre_revalidate(*de);
ll_intent_release(&parent_it);
}

>> I am adding Al that we discussed this code at some length and he found no problems
>> here, so I am a bit surprised by your findings.
> I'd be very happy to read Al's thoughts.

Some of the discussions were in lkml under subject of "More parallel
atomic_open/d_splice_alias fun with NFS and possibly more FSes"
in July 2016, in between nfs stuff.
There was more but I cannot readily find it.

>> Also the reason we reinvent the d_splice_alias is because we need to
>> splice not just directories, but also regular files.
>
> I see that. A key simplification I bring is that directories and
> non-directories can be handled separately. d_splice_alias() does
> all we need for directories, and nothing useful for non-dirs.

I see.
I still need to think some more about this whole thing.

Please see commit 99f1c013194e64d4b67d5d318148303b0e1585e1 for double d_add
of the same dentry fix.6

2017-07-19 08:30:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 01/12] staging: lustre: fix minor typos in comments

On Wed, Jul 19, 2017 at 09:26:47AM +1000, NeilBrown wrote:
> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lustre/llite/namei.c | 2 +-
> drivers/staging/lustre/lustre/mdc/mdc_locks.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)

As "obvious" as this is, I can't take patches without any changelog text
at all, sorry.

greg k-h

2017-08-02 03:26:24

by NeilBrown

[permalink] [raw]
Subject: [PATCH] staging: lustre: fix minor typos in comments


Fix minor typos in comments.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/llite/namei.c | 2 +-
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 8ed24ec1255d..7a1133661a65 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -488,7 +488,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
*de = alias;

if (!it_disposition(it, DISP_LOOKUP_NEG)) {
- /* we have lookup look - unhide dentry */
+ /* We have the "lookup" lock, so unhide dentry */
if (bits & MDS_INODELOCK_LOOKUP)
d_lustre_revalidate(*de);
} else if (!it_disposition(it, DISP_OPEN_CREATE)) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 3eb66cea65db..ae97c6f1aeb0 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1030,7 +1030,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
* If we're performing a creation, that means that unless the creation
* failed with EEXIST, we should fake up a negative dentry.
*
- * For everything else, we want to lookup to succeed.
+ * For everything else, we want the lookup to succeed.
*
* One additional note: if CREATE or OPEN succeeded, we add an extra
* reference to the request because we need to keep it around until
@@ -1040,7 +1040,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
* exactly what it_status refers to.
*
* If DISP_OPEN_OPEN is set, then it_status refers to the open() call,
- * otherwise if DISP_OPEN_CREATE is set, then it status is the
+ * otherwise if DISP_OPEN_CREATE is set, then it_status is the
* creation failure mode. In either case, one of DISP_LOOKUP_NEG or
* DISP_LOOKUP_POS will be set, indicating whether the child lookup
* was successful.
--
2.12.2


Attachments:
signature.asc (832.00 B)

2017-08-03 17:29:18

by James Simmons

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH] staging: lustre: fix minor typos in comments


> Fix minor typos in comments.

Acked-by: James Simmons <[email protected]>

> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lustre/llite/namei.c | 2 +-
> drivers/staging/lustre/lustre/mdc/mdc_locks.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
> index 8ed24ec1255d..7a1133661a65 100644
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -488,7 +488,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
> *de = alias;
>
> if (!it_disposition(it, DISP_LOOKUP_NEG)) {
> - /* we have lookup look - unhide dentry */
> + /* We have the "lookup" lock, so unhide dentry */
> if (bits & MDS_INODELOCK_LOOKUP)
> d_lustre_revalidate(*de);
> } else if (!it_disposition(it, DISP_OPEN_CREATE)) {
> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
> index 3eb66cea65db..ae97c6f1aeb0 100644
> --- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
> +++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
> @@ -1030,7 +1030,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
> * If we're performing a creation, that means that unless the creation
> * failed with EEXIST, we should fake up a negative dentry.
> *
> - * For everything else, we want to lookup to succeed.
> + * For everything else, we want the lookup to succeed.
> *
> * One additional note: if CREATE or OPEN succeeded, we add an extra
> * reference to the request because we need to keep it around until
> @@ -1040,7 +1040,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
> * exactly what it_status refers to.
> *
> * If DISP_OPEN_OPEN is set, then it_status refers to the open() call,
> - * otherwise if DISP_OPEN_CREATE is set, then it status is the
> + * otherwise if DISP_OPEN_CREATE is set, then it_status is the
> * creation failure mode. In either case, one of DISP_LOOKUP_NEG or
> * DISP_LOOKUP_POS will be set, indicating whether the child lookup
> * was successful.
> --
> 2.12.2
>
>