LinuxLists.cc - [PATCH 0/4] staging: lustre: obdclass: missing lu

2018-05-02 18:23:43

Subject: [PATCH 0/4] staging: lustre: obdclass: missing lu_object fixes

With the work going for lu_object by Neil I noticed him solving the
same problem as the Intel developers in a very similar approach. Also
with the changes we don't want to lose these important changes. This
is more mean for a basic review since in the end Neil and this work
will be combined in some fashion. Note this patch set is based on
top of Neil's cleanup patches for lu_objects published a few days ago.

Hongchao Zhang (1):
staging: lustre: obdclass: guarantee all keys filled

John L. Hammond (1):
staging: lustre: obdclass: hoist locking in lu_context_exit()

Lai Siyao (1):
staging: lustre: obdclass: change object lookup to no wait mode

Li Xi (1):
staging: lustre: obdclass: change spinlock of key to rwlock

drivers/staging/lustre/lustre/include/lu_object.h | 2 +-
drivers/staging/lustre/lustre/obdclass/lu_object.c | 153 +++++++++++----------
2 files changed, 83 insertions(+), 72 deletions(-)

--
1.8.3.1

2018-05-02 18:22:43

by James Simmons

[permalink] [raw]

Subject: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

From: Lai Siyao <[email protected]>

Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
to remove object from cache, but this may lead to deadlock, because
when other process lookup such object, it needs to wait for this
object until release (done at last refcount put), while that process
maybe already hold an LDLM lock.

Now that current code can handle dying object correctly, we can just
return such object in lookup, thus the above deadlock can be avoided.

Signed-off-by: Lai Siyao <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9049
Reviewed-on: https://review.whamcloud.com/26965
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Cliff White <[email protected]>
Reviewed-by: Fan Yong <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
drivers/staging/lustre/lustre/include/lu_object.h | 2 +-
drivers/staging/lustre/lustre/obdclass/lu_object.c | 82 +++++++++-------------
2 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
index f29bbca..232063a 100644
--- a/drivers/staging/lustre/lustre/include/lu_object.h
+++ b/drivers/staging/lustre/lustre/include/lu_object.h
@@ -673,7 +673,7 @@ static inline void lu_object_get(struct lu_object *o)
}

/**
- * Return true of object will not be cached after last reference to it is
+ * Return true if object will not be cached after last reference to it is
* released.
*/
static inline int lu_object_is_dying(const struct lu_object_header *h)
diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 8b507f1..9311703 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -589,19 +589,13 @@ static struct lu_object *htable_lookup(struct lu_site *s,
const struct lu_fid *f,
__u64 *version)
{
- struct cfs_hash *hs = s->ls_obj_hash;
struct lu_site_bkt_data *bkt;
struct lu_object_header *h;
struct hlist_node *hnode;
- __u64 ver;
- wait_queue_entry_t waiter;
+ u64 ver = cfs_hash_bd_version_get(bd);

-retry:
- ver = cfs_hash_bd_version_get(bd);
-
- if (*version == ver) {
+ if (*version == ver)
return ERR_PTR(-ENOENT);
- }

*version = ver;
bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
@@ -615,31 +609,13 @@ static struct lu_object *htable_lookup(struct lu_site *s,
}

h = container_of(hnode, struct lu_object_header, loh_hash);
- if (likely(!lu_object_is_dying(h))) {
- cfs_hash_get(s->ls_obj_hash, hnode);
- lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
- if (!list_empty(&h->loh_lru)) {
- list_del_init(&h->loh_lru);
- percpu_counter_dec(&s->ls_lru_len_counter);
- }
- return lu_object_top(h);
+ cfs_hash_get(s->ls_obj_hash, hnode);
+ lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
+ if (!list_empty(&h->loh_lru)) {
+ list_del_init(&h->loh_lru);
+ percpu_counter_dec(&s->ls_lru_len_counter);
}
-
- /*
- * Lookup found an object being destroyed this object cannot be
- * returned (to assure that references to dying objects are eventually
- * drained), and moreover, lookup has to wait until object is freed.
- */
-
- init_waitqueue_entry(&waiter, current);
- add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
- set_current_state(TASK_UNINTERRUPTIBLE);
- lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
- cfs_hash_bd_unlock(hs, bd, 1);
- schedule();
- remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
- cfs_hash_bd_lock(hs, bd, 1);
- goto retry;
+ return lu_object_top(h);
}

/**
@@ -680,6 +656,8 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev)
}

/**
+ * Core logic of lu_object_find*() functions.
+ *
* Much like lu_object_find(), but top level device of object is specifically
* \a dev rather than top level device of the site. This interface allows
* objects of different "stacking" to be created within the same site.
@@ -713,36 +691,46 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
* It is unnecessary to perform lookup-alloc-lookup-insert, instead,
* just alloc and insert directly.
*
+ * If dying object is found during index search, add @waiter to the
+ * site wait-queue and return ERR_PTR(-EAGAIN).
*/
- s = dev->ld_site;
- hs = s->ls_obj_hash;
+ if (conf && conf->loc_flags & LOC_F_NEW) {
+ o = lu_object_alloc(env, dev, f, conf);
+ if (unlikely(IS_ERR(o)))
+ return o;
+
+ hs = dev->ld_site->ls_obj_hash;
+ cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
+ cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
+ cfs_hash_bd_unlock(hs, &bd, 1);

- cfs_hash_bd_get(hs, f, &bd);
- if (!(conf && conf->loc_flags & LOC_F_NEW)) {
- cfs_hash_bd_lock(hs, &bd, 0);
- o = htable_lookup(s, &bd, f, &version);
- cfs_hash_bd_unlock(hs, &bd, 0);
+ lu_object_limit(env, dev);

- if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
- return o;
+ return o;
}
+
+ s = dev->ld_site;
+ hs = s->ls_obj_hash;
+ cfs_hash_bd_get_and_lock(hs, f, &bd, 1);
+ o = htable_lookup(s, &bd, f, &version);
+ cfs_hash_bd_unlock(hs, &bd, 0);
+ if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
+ return o;
+
/*
* Allocate new object. This may result in rather complicated
* operations, including fld queries, inode loading, etc.
*/
o = lu_object_alloc(env, dev, f, conf);
- if (IS_ERR(o))
+ if (unlikely(IS_ERR(o)))
return o;

LASSERT(lu_fid_eq(lu_object_fid(o), f));

cfs_hash_bd_lock(hs, &bd, 1);

- if (conf && conf->loc_flags & LOC_F_NEW)
- shadow = ERR_PTR(-ENOENT);
- else
- shadow = htable_lookup(s, &bd, f, &version);
- if (likely(PTR_ERR(shadow) == -ENOENT)) {
+ shadow = htable_lookup(s, &bd, f, &version);
+ if (likely(IS_ERR(shadow) && PTR_ERR(shadow) == -ENOENT)) {
cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
cfs_hash_bd_unlock(hs, &bd, 1);

--
1.8.3.1

2018-05-02 18:22:57

by James Simmons

[permalink] [raw]

Subject: [PATCH 2/4] staging: lustre: obdclass: hoist locking in lu_context_exit()

From: "John L. Hammond" <[email protected]>

Hoist lu_keys_guard locking out of the for loop in lu_context_exit().

Signed-off-by: John L. Hammond <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8918
Reviewed-on: https://review.whamcloud.com/24217
Reviewed-by: Dmitry Eremin <[email protected]>
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/lu_object.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 04475e9..4e4dd58 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -1711,10 +1711,11 @@ void lu_context_exit(struct lu_context *ctx)
LINVRNT(ctx->lc_state == LCS_ENTERED);
ctx->lc_state = LCS_LEFT;
if (ctx->lc_tags & LCT_HAS_EXIT && ctx->lc_value) {
+ /* could race with key quiescency */
+ if (ctx->lc_tags & LCT_REMEMBER)
+ read_lock(&lu_keys_guard);
+
for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
- /* could race with key quiescency */
- if (ctx->lc_tags & LCT_REMEMBER)
- read_lock(&lu_keys_guard);
if (ctx->lc_value[i]) {
struct lu_context_key *key;

@@ -1723,9 +1724,10 @@ void lu_context_exit(struct lu_context *ctx)
key->lct_exit(ctx,
key, ctx->lc_value[i]);
}
- if (ctx->lc_tags & LCT_REMEMBER)
- read_unlock(&lu_keys_guard);
}
+
+ if (ctx->lc_tags & LCT_REMEMBER)
+ read_unlock(&lu_keys_guard);
}
}
EXPORT_SYMBOL(lu_context_exit);
--
1.8.3.1

2018-05-02 18:23:07

by James Simmons

[permalink] [raw]

Subject: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

From: Li Xi <[email protected]>

Most of the time, keys are never changed. So rwlock might be
better for the concurrency of key read.

Signed-off-by: Li Xi <[email protected]>
Signed-off-by: Gu Zheng <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6800
Reviewed-on: http://review.whamcloud.com/15558
Reviewed-by: Faccini Bruno <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/lu_object.c | 38 +++++++++++-----------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index fa986f2..04475e9 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -1317,7 +1317,7 @@ enum {

static struct lu_context_key *lu_keys[LU_CONTEXT_KEY_NR] = { NULL, };

-static DEFINE_SPINLOCK(lu_keys_guard);
+static DEFINE_RWLOCK(lu_keys_guard);
static atomic_t lu_key_initing_cnt = ATOMIC_INIT(0);

/**
@@ -1341,7 +1341,7 @@ int lu_context_key_register(struct lu_context_key *key)
LASSERT(key->lct_tags != 0);

result = -ENFILE;
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
if (!lu_keys[i]) {
key->lct_index = i;
@@ -1353,7 +1353,7 @@ int lu_context_key_register(struct lu_context_key *key)
break;
}
}
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);
return result;
}
EXPORT_SYMBOL(lu_context_key_register);
@@ -1387,7 +1387,7 @@ void lu_context_key_degister(struct lu_context_key *key)
lu_context_key_quiesce(key);

++key_set_version;
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
key_fini(&lu_shrink_env.le_ctx, key->lct_index);

/**
@@ -1395,18 +1395,18 @@ void lu_context_key_degister(struct lu_context_key *key)
* run lu_context_key::lct_fini() method.
*/
while (atomic_read(&key->lct_used) > 1) {
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);
CDEBUG(D_INFO, "%s: \"%s\" %p, %d\n",
__func__, module_name(key->lct_owner),
key, atomic_read(&key->lct_used));
schedule();
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
}
if (lu_keys[key->lct_index]) {
lu_keys[key->lct_index] = NULL;
lu_ref_fini(&key->lct_reference);
}
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);

LASSERTF(atomic_read(&key->lct_used) == 1,
"key has instances: %d\n",
@@ -1526,7 +1526,7 @@ void lu_context_key_quiesce(struct lu_context_key *key)
/*
* XXX memory barrier has to go here.
*/
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
key->lct_tags |= LCT_QUIESCENT;

/**
@@ -1534,19 +1534,19 @@ void lu_context_key_quiesce(struct lu_context_key *key)
* have completed.
*/
while (atomic_read(&lu_key_initing_cnt) > 0) {
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);
CDEBUG(D_INFO, "%s: \"%s\" %p, %d (%d)\n",
__func__,
module_name(key->lct_owner),
key, atomic_read(&key->lct_used),
atomic_read(&lu_key_initing_cnt));
schedule();
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
}

list_for_each_entry(ctx, &lu_context_remembered, lc_remember)
key_fini(ctx, key->lct_index);
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);
++key_set_version;
}
}
@@ -1584,9 +1584,9 @@ static int keys_fill(struct lu_context *ctx)
* An atomic_t variable is still used, in order not to reacquire the
* lock when decrementing the counter.
*/
- spin_lock(&lu_keys_guard);
+ read_lock(&lu_keys_guard);
atomic_inc(&lu_key_initing_cnt);
- spin_unlock(&lu_keys_guard);
+ read_unlock(&lu_keys_guard);

LINVRNT(ctx->lc_value);
for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
@@ -1655,9 +1655,9 @@ int lu_context_init(struct lu_context *ctx, __u32 tags)
ctx->lc_state = LCS_INITIALIZED;
ctx->lc_tags = tags;
if (tags & LCT_REMEMBER) {
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
list_add(&ctx->lc_remember, &lu_context_remembered);
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);
} else {
INIT_LIST_HEAD(&ctx->lc_remember);
}
@@ -1683,10 +1683,10 @@ void lu_context_fini(struct lu_context *ctx)
keys_fini(ctx);

} else { /* could race with key degister */
- spin_lock(&lu_keys_guard);
+ write_lock(&lu_keys_guard);
keys_fini(ctx);
list_del_init(&ctx->lc_remember);
- spin_unlock(&lu_keys_guard);
+ write_unlock(&lu_keys_guard);
}
}
EXPORT_SYMBOL(lu_context_fini);
@@ -1714,7 +1714,7 @@ void lu_context_exit(struct lu_context *ctx)
for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
/* could race with key quiescency */
if (ctx->lc_tags & LCT_REMEMBER)
- spin_lock(&lu_keys_guard);
+ read_lock(&lu_keys_guard);
if (ctx->lc_value[i]) {
struct lu_context_key *key;

@@ -1724,7 +1724,7 @@ void lu_context_exit(struct lu_context *ctx)
key, ctx->lc_value[i]);
}
if (ctx->lc_tags & LCT_REMEMBER)
- spin_unlock(&lu_keys_guard);
+ read_unlock(&lu_keys_guard);
}
}
}
--
1.8.3.1

2018-05-02 18:23:47

by James Simmons

[permalink] [raw]

Subject: [PATCH 3/4] staging: lustre: obdclass: guarantee all keys filled

From: Hongchao Zhang <[email protected]>

In keys_fill, the key_set_version could be changed after
the keys are filled, then the keys in this context won't
be refilled by the following lu_context_refill for its
version is equal to the current key_set_version.

In lu_context_refill, the key_set_version should be protected
before comparing it to version stored in the lu_context.

Signed-off-by: Hongchao Zhang <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8346
Reviewed-on: https://review.whamcloud.com/26099
Reviewed-on: https://review.whamcloud.com/27448
Reviewed-on: https://review.whamcloud.com/27994
Reviewed-by: Patrick Farrell <[email protected]>
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Mike Pershin <[email protected]>
Reviewed-by: Fan Yong <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/lu_object.c | 29 +++++++++++++++++++---
1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 4e4dd58..8b507f1 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -1386,8 +1386,8 @@ void lu_context_key_degister(struct lu_context_key *key)

lu_context_key_quiesce(key);

- ++key_set_version;
write_lock(&lu_keys_guard);
+ ++key_set_version;
key_fini(&lu_shrink_env.le_ctx, key->lct_index);

/**
@@ -1546,15 +1546,18 @@ void lu_context_key_quiesce(struct lu_context_key *key)

list_for_each_entry(ctx, &lu_context_remembered, lc_remember)
key_fini(ctx, key->lct_index);
- write_unlock(&lu_keys_guard);
+
++key_set_version;
+ write_unlock(&lu_keys_guard);
}
}

void lu_context_key_revive(struct lu_context_key *key)
{
+ write_lock(&lu_keys_guard);
key->lct_tags &= ~LCT_QUIESCENT;
++key_set_version;
+ write_unlock(&lu_keys_guard);
}

static void keys_fini(struct lu_context *ctx)
@@ -1573,6 +1576,7 @@ static void keys_fini(struct lu_context *ctx)

static int keys_fill(struct lu_context *ctx)
{
+ unsigned int pre_version;
unsigned int i;

/*
@@ -1586,8 +1590,10 @@ static int keys_fill(struct lu_context *ctx)
*/
read_lock(&lu_keys_guard);
atomic_inc(&lu_key_initing_cnt);
+ pre_version = key_set_version;
read_unlock(&lu_keys_guard);

+refill:
LINVRNT(ctx->lc_value);
for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) {
struct lu_context_key *key;
@@ -1628,9 +1634,17 @@ static int keys_fill(struct lu_context *ctx)
if (key->lct_exit)
ctx->lc_tags |= LCT_HAS_EXIT;
}
- ctx->lc_version = key_set_version;
}
+
+ read_lock(&lu_keys_guard);
+ if (pre_version != key_set_version) {
+ pre_version = key_set_version;
+ read_unlock(&lu_keys_guard);
+ goto refill;
+ }
+ ctx->lc_version = key_set_version;
atomic_dec(&lu_key_initing_cnt);
+ read_unlock(&lu_keys_guard);
return 0;
}

@@ -1739,7 +1753,14 @@ void lu_context_exit(struct lu_context *ctx)
*/
int lu_context_refill(struct lu_context *ctx)
{
- return likely(ctx->lc_version == key_set_version) ? 0 : keys_fill(ctx);
+ read_lock(&lu_keys_guard);
+ if (likely(ctx->lc_version == key_set_version)) {
+ read_unlock(&lu_keys_guard);
+ return 0;
+ }
+
+ read_unlock(&lu_keys_guard);
+ return keys_fill(ctx);
}

/**
--
1.8.3.1

2018-05-03 13:49:54

by David Laight

[permalink] [raw]

Subject: RE: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

From: James Simmons
> Sent: 02 May 2018 19:22
> From: Li Xi <[email protected]>
>
> Most of the time, keys are never changed. So rwlock might be
> better for the concurrency of key read.

OTOH unless there is contention on the spin lock during reads the
additional cost of a rwlock (probably double that of a spinlock)
will hurt performance.

...
> - spin_lock(&lu_keys_guard);
> + read_lock(&lu_keys_guard);
> atomic_inc(&lu_key_initing_cnt);
> - spin_unlock(&lu_keys_guard);
> + read_unlock(&lu_keys_guard);

WTF, seems unlikely that you need to hold any kind of lock
over an atomic_inc().

If this is just ensuring that no code holds the lock then
it would need to request the write_lock().
(and would need a comment)

David

2018-05-03 23:27:20

by NeilBrown

[permalink] [raw]

Subject: RE: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

On Thu, May 03 2018, David Laight wrote:

> From: James Simmons
>> Sent: 02 May 2018 19:22
>> From: Li Xi <[email protected]>
>>
>> Most of the time, keys are never changed. So rwlock might be
>> better for the concurrency of key read.
>
> OTOH unless there is contention on the spin lock during reads the
> additional cost of a rwlock (probably double that of a spinlock)
> will hurt performance.

That's roughly what I was going to say - rwlocks are rarely a win.
I think the second patch which caused the lock to be taken less often
would have a bigger impact that the switch to rwlocks.

However I suspect a better approach would be to investigate some sort of
lockless solution.
I think the use of the spinlock in lu_context_key_register() could be
replaced with a careful cmp_xchg(). I'm less sure about
lu_context_key_degister(), but it might be possible.

>
> ...
>> - spin_lock(&lu_keys_guard);
>> + read_lock(&lu_keys_guard);
>> atomic_inc(&lu_key_initing_cnt);
>> - spin_unlock(&lu_keys_guard);
>> + read_unlock(&lu_keys_guard);
>
> WTF, seems unlikely that you need to hold any kind of lock
> over an atomic_inc().
>
> If this is just ensuring that no code holds the lock then
> it would need to request the write_lock().
> (and would need a comment)

There is a comment - that patch showed the last 2 lines of it.
This is for synchronization with lu_context_key_quiesce().
That spins(!! calling schedule, but still... not good) until
the lu_key_initing_cnt is zero while it holds the write lock.
Then it is sure that the code protected by this counter isn't
running.
I'm sure this can be improved! I would need to study it carefully to
see how.

Note that I don't object to these patches going in - if they provide a
measurable improvement which seems likely, then in they go. But I
hope the code won't stay like this long term.

Thanks,
NeilBrown

Attachments:

signature.asc (847.00 B)

2018-05-04 00:13:26

by Dilger, Andreas

[permalink] [raw]

Subject: Re: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

On May 3, 2018, at 07:50, David Laight <[email protected]> wrote:
>
> From: James Simmons
>> Sent: 02 May 2018 19:22
>> From: Li Xi <[email protected]>
>>
>> Most of the time, keys are never changed. So rwlock might be
>> better for the concurrency of key read.
>
> OTOH unless there is contention on the spin lock during reads the
> additional cost of a rwlock (probably double that of a spinlock)
> will hurt performance.
>
> ...
>> - spin_lock(&lu_keys_guard);
>> + read_lock(&lu_keys_guard);
>> atomic_inc(&lu_key_initing_cnt);
>> - spin_unlock(&lu_keys_guard);
>> + read_unlock(&lu_keys_guard);
>
> WTF, seems unlikely that you need to hold any kind of lock
> over an atomic_inc().
>
> If this is just ensuring that no code holds the lock then
> it would need to request the write_lock().
> (and would need a comment)

There was a fair amount of benchmarking done for this that shows the
performance is significantly improved with the patch, which can be
seen in the ticket that was referenced in the original commit comment:

https://jira.hpdd.intel.com/browse/LU-6800?focusedCommentId=121776#comment-121776

That said, it might be good to include this information into the
commit comment itself.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation

2018-05-04 00:54:12

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

On Fri, May 04 2018, Dilger, Andreas wrote:

> On May 3, 2018, at 07:50, David Laight <[email protected]> wrote:
>>
>> From: James Simmons
>>> Sent: 02 May 2018 19:22
>>> From: Li Xi <[email protected]>
>>>
>>> Most of the time, keys are never changed. So rwlock might be
>>> better for the concurrency of key read.
>>
>> OTOH unless there is contention on the spin lock during reads the
>> additional cost of a rwlock (probably double that of a spinlock)
>> will hurt performance.
>>
>> ...
>>> - spin_lock(&lu_keys_guard);
>>> + read_lock(&lu_keys_guard);
>>> atomic_inc(&lu_key_initing_cnt);
>>> - spin_unlock(&lu_keys_guard);
>>> + read_unlock(&lu_keys_guard);
>>
>> WTF, seems unlikely that you need to hold any kind of lock
>> over an atomic_inc().
>>
>> If this is just ensuring that no code holds the lock then
>> it would need to request the write_lock().
>> (and would need a comment)
>
> There was a fair amount of benchmarking done for this that shows the
> performance is significantly improved with the patch, which can be
> seen in the ticket that was referenced in the original commit comment:
>
> https://jira.hpdd.intel.com/browse/LU-6800?focusedCommentId=121776#comment-121776

That does surprise me. The only places where the lock is held for read
are very short - clearing a few fields or incrementing a value.
But numbers don't lie.
I wonder if the next patch would have had just as big an effect. Taking
and dropping the lock 40 times is not likely to be good for performance.

Thanks,
NeilBrown

>
> That said, it might be good to include this information into the
> commit comment itself.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation

Attachments:

signature.asc (847.00 B)

2018-05-04 01:17:41

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On Wed, May 02 2018, James Simmons wrote:

> From: Lai Siyao <[email protected]>
>
> Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
> to remove object from cache, but this may lead to deadlock, because
> when other process lookup such object, it needs to wait for this
> object until release (done at last refcount put), while that process
> maybe already hold an LDLM lock.
>
> Now that current code can handle dying object correctly, we can just
> return such object in lookup, thus the above deadlock can be avoided.

I think one of the reasons that I didn't apply this to mainline myself
is that "Now that" comment. When is the "now" that it is referring to?
Are were sure that all code in mainline "can handle dying objects
correctly"??

>
> Signed-off-by: Lai Siyao <[email protected]>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9049
> Reviewed-on: https://review.whamcloud.com/26965
> Reviewed-by: Alex Zhuravlev <[email protected]>
> Tested-by: Cliff White <[email protected]>
> Reviewed-by: Fan Yong <[email protected]>
> Reviewed-by: Oleg Drokin <[email protected]>
> Signed-off-by: James Simmons <[email protected]>
> ---
> drivers/staging/lustre/lustre/include/lu_object.h | 2 +-
> drivers/staging/lustre/lustre/obdclass/lu_object.c | 82 +++++++++-------------
> 2 files changed, 36 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
> index f29bbca..232063a 100644
> --- a/drivers/staging/lustre/lustre/include/lu_object.h
> +++ b/drivers/staging/lustre/lustre/include/lu_object.h
> @@ -673,7 +673,7 @@ static inline void lu_object_get(struct lu_object *o)
> }
>
> /**
> - * Return true of object will not be cached after last reference to it is
> + * Return true if object will not be cached after last reference to it is
> * released.
> */
> static inline int lu_object_is_dying(const struct lu_object_header *h)
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index 8b507f1..9311703 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -589,19 +589,13 @@ static struct lu_object *htable_lookup(struct lu_site *s,
> const struct lu_fid *f,
> __u64 *version)
> {
> - struct cfs_hash *hs = s->ls_obj_hash;
> struct lu_site_bkt_data *bkt;
> struct lu_object_header *h;
> struct hlist_node *hnode;
> - __u64 ver;
> - wait_queue_entry_t waiter;
> + u64 ver = cfs_hash_bd_version_get(bd);
>
> -retry:
> - ver = cfs_hash_bd_version_get(bd);
> -
> - if (*version == ver) {
> + if (*version == ver)
> return ERR_PTR(-ENOENT);
> - }
>
> *version = ver;
> bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
> @@ -615,31 +609,13 @@ static struct lu_object *htable_lookup(struct lu_site *s,
> }
>
> h = container_of(hnode, struct lu_object_header, loh_hash);
> - if (likely(!lu_object_is_dying(h))) {
> - cfs_hash_get(s->ls_obj_hash, hnode);
> - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
> - if (!list_empty(&h->loh_lru)) {
> - list_del_init(&h->loh_lru);
> - percpu_counter_dec(&s->ls_lru_len_counter);
> - }
> - return lu_object_top(h);
> + cfs_hash_get(s->ls_obj_hash, hnode);
> + lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
> + if (!list_empty(&h->loh_lru)) {
> + list_del_init(&h->loh_lru);
> + percpu_counter_dec(&s->ls_lru_len_counter);
> }
> -
> - /*
> - * Lookup found an object being destroyed this object cannot be
> - * returned (to assure that references to dying objects are eventually
> - * drained), and moreover, lookup has to wait until object is freed.
> - */
> -
> - init_waitqueue_entry(&waiter, current);
> - add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
> - set_current_state(TASK_UNINTERRUPTIBLE);
> - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
> - cfs_hash_bd_unlock(hs, bd, 1);
> - schedule();
> - remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
> - cfs_hash_bd_lock(hs, bd, 1);
> - goto retry;
> + return lu_object_top(h);
> }
>
> /**
> @@ -680,6 +656,8 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev)
> }
>
> /**
> + * Core logic of lu_object_find*() functions.
> + *
> * Much like lu_object_find(), but top level device of object is specifically
> * \a dev rather than top level device of the site. This interface allows
> * objects of different "stacking" to be created within the same site.
> @@ -713,36 +691,46 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
> * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
> * just alloc and insert directly.
> *
> + * If dying object is found during index search, add @waiter to the
> + * site wait-queue and return ERR_PTR(-EAGAIN).

It seems odd to add this comment here, when it seems to describe code
that is being removed.
I can see that this comment is added by the upstream patch
Commit: fa14bdf6b648 ("LU-9049 obdclass: change object lookup to no wait mode")
but I cannot see what it refers to.

Otherwise that patch looks good.

Thanks,
NeilBrown

Attachments:

signature.asc (847.00 B)

2018-05-07 13:16:33

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On Wed, May 02, 2018 at 02:21:48PM -0400, James Simmons wrote:
> From: Lai Siyao <[email protected]>
>
> Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
> to remove object from cache, but this may lead to deadlock, because
> when other process lookup such object, it needs to wait for this
> object until release (done at last refcount put), while that process
> maybe already hold an LDLM lock.
>
> Now that current code can handle dying object correctly, we can just
> return such object in lookup, thus the above deadlock can be avoided.
>
> Signed-off-by: Lai Siyao <[email protected]>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9049
> Reviewed-on: https://review.whamcloud.com/26965
> Reviewed-by: Alex Zhuravlev <[email protected]>
> Tested-by: Cliff White <[email protected]>
> Reviewed-by: Fan Yong <[email protected]>
> Reviewed-by: Oleg Drokin <[email protected]>
> Signed-off-by: James Simmons <[email protected]>
> ---
> drivers/staging/lustre/lustre/include/lu_object.h | 2 +-
> drivers/staging/lustre/lustre/obdclass/lu_object.c | 82 +++++++++-------------
> 2 files changed, 36 insertions(+), 48 deletions(-)

Patch does not apply to my tree :(

2018-05-08 11:46:25

by Dan Carpenter

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

> /*
> * Allocate new object. This may result in rather complicated
> * operations, including fld queries, inode loading, etc.
> */
> o = lu_object_alloc(env, dev, f, conf);
> - if (IS_ERR(o))
> + if (unlikely(IS_ERR(o)))
> return o;
>

This is an unrelated and totally pointless. likely/unlikely annotations
hurt readability, and they should only be added if it's something which
is going to show up in benchmarking. lu_object_alloc() is already too
slow for the unlikely() to make a difference and anyway IS_ERR() has an
unlikely built in so it's duplicative...

Anyway, I understand that Intel has been ignoring kernel.org instead of
sending forwarding their patches properly so you're doing a difficult
and thankless job... Thanks for that. I'm sure it's frustrating to
look at these patches for you as well.

regards,
dan carpenter

2018-05-15 00:38:00

by James Simmons

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

> On Wed, May 02 2018, James Simmons wrote:
>
> > From: Lai Siyao <[email protected]>
> >
> > Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
> > to remove object from cache, but this may lead to deadlock, because
> > when other process lookup such object, it needs to wait for this
> > object until release (done at last refcount put), while that process
> > maybe already hold an LDLM lock.
> >
> > Now that current code can handle dying object correctly, we can just
> > return such object in lookup, thus the above deadlock can be avoided.
>
> I think one of the reasons that I didn't apply this to mainline myself
> is that "Now that" comment. When is the "now" that it is referring to?
> Are were sure that all code in mainline "can handle dying objects
> correctly"??

So I talked to Lai and he posted the LU-9049 ticket what patches need to
land before this one. Only one patch is of concern and its for LU-9203
which doesn't apply to the staging tree since we don't have the LNet SMP
updates in our tree. I saved notes about making sure LU-9203 lands
together with the future LNet SMP changes. As it stands it is safe to
land to staging.

> > Signed-off-by: Lai Siyao <[email protected]>
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9049
> > Reviewed-on: https://review.whamcloud.com/26965
> > Reviewed-by: Alex Zhuravlev <[email protected]>
> > Tested-by: Cliff White <[email protected]>
> > Reviewed-by: Fan Yong <[email protected]>
> > Reviewed-by: Oleg Drokin <[email protected]>
> > Signed-off-by: James Simmons <[email protected]>
> > ---
> > drivers/staging/lustre/lustre/include/lu_object.h | 2 +-
> > drivers/staging/lustre/lustre/obdclass/lu_object.c | 82 +++++++++-------------
> > 2 files changed, 36 insertions(+), 48 deletions(-)
> >
> > diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
> > index f29bbca..232063a 100644
> > --- a/drivers/staging/lustre/lustre/include/lu_object.h
> > +++ b/drivers/staging/lustre/lustre/include/lu_object.h
> > @@ -673,7 +673,7 @@ static inline void lu_object_get(struct lu_object *o)
> > }
> >
> > /**
> > - * Return true of object will not be cached after last reference to it is
> > + * Return true if object will not be cached after last reference to it is
> > * released.
> > */
> > static inline int lu_object_is_dying(const struct lu_object_header *h)
> > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> > index 8b507f1..9311703 100644
> > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> > @@ -589,19 +589,13 @@ static struct lu_object *htable_lookup(struct lu_site *s,
> > const struct lu_fid *f,
> > __u64 *version)
> > {
> > - struct cfs_hash *hs = s->ls_obj_hash;
> > struct lu_site_bkt_data *bkt;
> > struct lu_object_header *h;
> > struct hlist_node *hnode;
> > - __u64 ver;
> > - wait_queue_entry_t waiter;
> > + u64 ver = cfs_hash_bd_version_get(bd);
> >
> > -retry:
> > - ver = cfs_hash_bd_version_get(bd);
> > -
> > - if (*version == ver) {
> > + if (*version == ver)
> > return ERR_PTR(-ENOENT);
> > - }
> >
> > *version = ver;
> > bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
> > @@ -615,31 +609,13 @@ static struct lu_object *htable_lookup(struct lu_site *s,
> > }
> >
> > h = container_of(hnode, struct lu_object_header, loh_hash);
> > - if (likely(!lu_object_is_dying(h))) {
> > - cfs_hash_get(s->ls_obj_hash, hnode);
> > - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
> > - if (!list_empty(&h->loh_lru)) {
> > - list_del_init(&h->loh_lru);
> > - percpu_counter_dec(&s->ls_lru_len_counter);
> > - }
> > - return lu_object_top(h);
> > + cfs_hash_get(s->ls_obj_hash, hnode);
> > + lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
> > + if (!list_empty(&h->loh_lru)) {
> > + list_del_init(&h->loh_lru);
> > + percpu_counter_dec(&s->ls_lru_len_counter);
> > }
> > -
> > - /*
> > - * Lookup found an object being destroyed this object cannot be
> > - * returned (to assure that references to dying objects are eventually
> > - * drained), and moreover, lookup has to wait until object is freed.
> > - */
> > -
> > - init_waitqueue_entry(&waiter, current);
> > - add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
> > - set_current_state(TASK_UNINTERRUPTIBLE);
> > - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
> > - cfs_hash_bd_unlock(hs, bd, 1);
> > - schedule();
> > - remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
> > - cfs_hash_bd_lock(hs, bd, 1);
> > - goto retry;
> > + return lu_object_top(h);
> > }
> >
> > /**
> > @@ -680,6 +656,8 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev)
> > }
> >
> > /**
> > + * Core logic of lu_object_find*() functions.
> > + *
> > * Much like lu_object_find(), but top level device of object is specifically
> > * \a dev rather than top level device of the site. This interface allows
> > * objects of different "stacking" to be created within the same site.
> > @@ -713,36 +691,46 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
> > * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
> > * just alloc and insert directly.
> > *
> > + * If dying object is found during index search, add @waiter to the
> > + * site wait-queue and return ERR_PTR(-EAGAIN).
>
> It seems odd to add this comment here, when it seems to describe code
> that is being removed.
> I can see that this comment is added by the upstream patch
> Commit: fa14bdf6b648 ("LU-9049 obdclass: change object lookup to no wait mode")
> but I cannot see what it refers to.
>
> Otherwise that patch looks good.
>
> Thanks,
> NeilBrown
>

2018-05-15 01:38:35

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On Tue, May 15 2018, James Simmons wrote:

>> On Wed, May 02 2018, James Simmons wrote:
>>
>> > From: Lai Siyao <[email protected]>
>> >
>> > Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
>> > to remove object from cache, but this may lead to deadlock, because
>> > when other process lookup such object, it needs to wait for this
>> > object until release (done at last refcount put), while that process
>> > maybe already hold an LDLM lock.
>> >
>> > Now that current code can handle dying object correctly, we can just
>> > return such object in lookup, thus the above deadlock can be avoided.
>>
>> I think one of the reasons that I didn't apply this to mainline myself
>> is that "Now that" comment. When is the "now" that it is referring to?
>> Are were sure that all code in mainline "can handle dying objects
>> correctly"??
>
> So I talked to Lai and he posted the LU-9049 ticket what patches need to
> land before this one. Only one patch is of concern and its for LU-9203
> which doesn't apply to the staging tree since we don't have the LNet SMP
> updates in our tree. I saved notes about making sure LU-9203 lands
> together with the future LNet SMP changes. As it stands it is safe to
> land to staging.

Thanks a lot for looking into this. Nice to have the safety of this
change confirmed.

What do you think of:

>> > @@ -713,36 +691,46 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
>> > * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
>> > * just alloc and insert directly.
>> > *
>> > + * If dying object is found during index search, add @waiter to the
>> > + * site wait-queue and return ERR_PTR(-EAGAIN).
>>
>> It seems odd to add this comment here, when it seems to describe code
>> that is being removed.
>> I can see that this comment is added by the upstream patch
>> Commit: fa14bdf6b648 ("LU-9049 obdclass: change object lookup to no wait mode")
>> but I cannot see what it refers to.
>>

??

Am I misunderstanding something, or is that comment wrong?

Thanks,
NeilBrown

Attachments:

signature.asc (847.00 B)

2018-05-15 02:12:38

by James Simmons

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

> >> On Wed, May 02 2018, James Simmons wrote:
> >>
> >> > From: Lai Siyao <[email protected]>
> >> >
> >> > Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
> >> > to remove object from cache, but this may lead to deadlock, because
> >> > when other process lookup such object, it needs to wait for this
> >> > object until release (done at last refcount put), while that process
> >> > maybe already hold an LDLM lock.
> >> >
> >> > Now that current code can handle dying object correctly, we can just
> >> > return such object in lookup, thus the above deadlock can be avoided.
> >>
> >> I think one of the reasons that I didn't apply this to mainline myself
> >> is that "Now that" comment. When is the "now" that it is referring to?
> >> Are were sure that all code in mainline "can handle dying objects
> >> correctly"??
> >
> > So I talked to Lai and he posted the LU-9049 ticket what patches need to
> > land before this one. Only one patch is of concern and its for LU-9203
> > which doesn't apply to the staging tree since we don't have the LNet SMP
> > updates in our tree. I saved notes about making sure LU-9203 lands
> > together with the future LNet SMP changes. As it stands it is safe to
> > land to staging.
>
> Thanks a lot for looking into this. Nice to have the safety of this
> change confirmed.
>
> What do you think of:
>
> >> > @@ -713,36 +691,46 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
> >> > * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
> >> > * just alloc and insert directly.
> >> > *
> >> > + * If dying object is found during index search, add @waiter to the
> >> > + * site wait-queue and return ERR_PTR(-EAGAIN).
> >>
> >> It seems odd to add this comment here, when it seems to describe code
> >> that is being removed.
> >> I can see that this comment is added by the upstream patch
> >> Commit: fa14bdf6b648 ("LU-9049 obdclass: change object lookup to no wait mode")
> >> but I cannot see what it refers to.
> >>
>
> ??
>
> Am I misunderstanding something, or is that comment wrong?

I think the comment is wrong. That comment was in the other tree before
the patch was landed. It got included with this push due to me diffing the
tree by accident. I will remove it with the next push.

2018-05-15 15:48:14

by James Simmons

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

> > /*
> > * Allocate new object. This may result in rather complicated
> > * operations, including fld queries, inode loading, etc.
> > */
> > o = lu_object_alloc(env, dev, f, conf);
> > - if (IS_ERR(o))
> > + if (unlikely(IS_ERR(o)))
> > return o;
> >
>
> This is an unrelated and totally pointless. likely/unlikely annotations
> hurt readability, and they should only be added if it's something which
> is going to show up in benchmarking. lu_object_alloc() is already too
> slow for the unlikely() to make a difference and anyway IS_ERR() has an
> unlikely built in so it's duplicative...

Sounds like a good checkpatch case to test for :-) Some people like to try
and milk ever cycle they can. Personally for me I never use those
annotations. With modern processors I'm skeptical if their benefits.
I do cleanup up the patches to some extent to make it compliant with
kernel standards but leave the core code in place for people to comment
on.

> Anyway, I understand that Intel has been ignoring kernel.org instead of
> sending forwarding their patches properly so you're doing a difficult
> and thankless job... Thanks for that. I'm sure it's frustrating to
> look at these patches for you as well.

Thank you for the complement. Also thank you for taking time to review
these patches. Your feedback is most welcomed and benefitical to the
health of the lustre client.

Sadly its not just Intel but other vendors that don't directly contribute
to the linux lustre client. I have spoke to the vendors about contributing
and they all say the same thing. No working with drivers in the staging
tree. Sadly all the parties involved are very interested in the success
of the lustre client. No one has ever told me directly why they don't get
involved but I suspect it has to deal with 2 reasons. One is that staging
drivers are not normally enabled by distributions so their clients
normally will never deal with the staging lustre client. Secondly vendors
just lack the man power to contribute in a meanful way.

2018-05-16 08:01:31

by Dan Carpenter

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On Tue, May 15, 2018 at 04:02:55PM +0100, James Simmons wrote:
>
> > > /*
> > > * Allocate new object. This may result in rather complicated
> > > * operations, including fld queries, inode loading, etc.
> > > */
> > > o = lu_object_alloc(env, dev, f, conf);
> > > - if (IS_ERR(o))
> > > + if (unlikely(IS_ERR(o)))
> > > return o;
> > >
> >
> > This is an unrelated and totally pointless. likely/unlikely annotations
> > hurt readability, and they should only be added if it's something which
> > is going to show up in benchmarking. lu_object_alloc() is already too
> > slow for the unlikely() to make a difference and anyway IS_ERR() has an
> > unlikely built in so it's duplicative...
>
> Sounds like a good checkpatch case to test for :-)

The likely/unlikely annotations have their place in fast paths so a
checkpatch warning would get annoying...

regards,
dan carpenter

2018-05-16 09:15:01

by Dilger, Andreas

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On May 16, 2018, at 02:00, Dan Carpenter <[email protected]> wrote:
>
> On Tue, May 15, 2018 at 04:02:55PM +0100, James Simmons wrote:
>>
>>>> /*
>>>> * Allocate new object. This may result in rather complicated
>>>> * operations, including fld queries, inode loading, etc.
>>>> */
>>>> o = lu_object_alloc(env, dev, f, conf);
>>>> - if (IS_ERR(o))
>>>> + if (unlikely(IS_ERR(o)))
>>>> return o;
>>>>
>>>
>>> This is an unrelated and totally pointless. likely/unlikely annotations
>>> hurt readability, and they should only be added if it's something which
>>> is going to show up in benchmarking. lu_object_alloc() is already too
>>> slow for the unlikely() to make a difference and anyway IS_ERR() has an
>>> unlikely built in so it's duplicative...
>>
>> Sounds like a good checkpatch case to test for :-)
>
> The likely/unlikely annotations have their place in fast paths so a
> checkpatch warning would get annoying...

I think James was suggesting a check for unlikely(IS_ERR()), or possibly
a check for unlikely() on something that is already unlikely() after CPP
expansion.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation

2018-05-16 15:46:04

by Joe Perches

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On Wed, 2018-05-16 at 09:12 +0000, Dilger, Andreas wrote:
> On May 16, 2018, at 02:00, Dan Carpenter <[email protected]> wrote:
> >
> > On Tue, May 15, 2018 at 04:02:55PM +0100, James Simmons wrote:
> > >
> > > > > /*
> > > > > * Allocate new object. This may result in rather complicated
> > > > > * operations, including fld queries, inode loading, etc.
> > > > > */
> > > > > o = lu_object_alloc(env, dev, f, conf);
> > > > > - if (IS_ERR(o))
> > > > > + if (unlikely(IS_ERR(o)))
> > > > > return o;
> > > > >
> > > >
> > > > This is an unrelated and totally pointless. likely/unlikely annotations
> > > > hurt readability, and they should only be added if it's something which
> > > > is going to show up in benchmarking. lu_object_alloc() is already too
> > > > slow for the unlikely() to make a difference and anyway IS_ERR() has an
> > > > unlikely built in so it's duplicative...
> > >
> > > Sounds like a good checkpatch case to test for :-)
> >
> > The likely/unlikely annotations have their place in fast paths so a
> > checkpatch warning would get annoying...
>
> I think James was suggesting a check for unlikely(IS_ERR()),

Probably so.

$ git grep -P 'likely\s*\(\s*\!?\s*IS_ERR' | wc -l
42

Are there other known likely/unlikely duplications?

> or possibly
> a check for unlikely() on something that is already unlikely() after CPP
> expansion.

checkpatch isn't the tool for that type of test
as it is a collection of trivial regex tests and
it is not a c90 preprocessor.

Anyway, here's a possible checkpatch patch.
---
scripts/checkpatch.pl | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index baddac9379f0..20c0973f1c39 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -6299,6 +6299,12 @@ sub process {
"#define of '$1' is wrong - use Kconfig variables or standard guards instead\n" . $herecurr);
}

+# likely/unlikely tests with IS_ERR (already unlikely)"
+ if ($line =~ /\b((?:un)?likely)\s*$\s*\!?\s*(IS_ERR[A-Z_]*)\s*\(/) {
+ WARN("DUPLICATE_LIKELY",
+ "Unnecessary use of $1 with $2 as it already has an unlikely\n" . $herecurr);
+ }
+
# likely/unlikely comparisons similar to "(likely(foo) > 0)"
if ($^V && $^V ge 5.10.0 &&
$line =~ /\b((?:un)?likely)\s*\(\s*$FuncArg\s*$\s*$Compare/) {

2018-05-16 16:58:03

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

On Tue, May 15, 2018 at 04:02:55PM +0100, James Simmons wrote:
> > > /*
> > > * Allocate new object. This may result in rather complicated
> > > * operations, including fld queries, inode loading, etc.
> > > */
> > > o = lu_object_alloc(env, dev, f, conf);
> > > - if (IS_ERR(o))
> > > + if (unlikely(IS_ERR(o)))
> > > return o;
> > >
> >
> > This is an unrelated and totally pointless. likely/unlikely annotations
> > hurt readability, and they should only be added if it's something which
> > is going to show up in benchmarking. lu_object_alloc() is already too
> > slow for the unlikely() to make a difference and anyway IS_ERR() has an
> > unlikely built in so it's duplicative...
>
> Sounds like a good checkpatch case to test for :-) Some people like to try
> and milk ever cycle they can. Personally for me I never use those
> annotations. With modern processors I'm skeptical if their benefits.
> I do cleanup up the patches to some extent to make it compliant with
> kernel standards but leave the core code in place for people to comment
> on.
>
> > Anyway, I understand that Intel has been ignoring kernel.org instead of
> > sending forwarding their patches properly so you're doing a difficult
> > and thankless job... Thanks for that. I'm sure it's frustrating to
> > look at these patches for you as well.
>
> Thank you for the complement. Also thank you for taking time to review
> these patches. Your feedback is most welcomed and benefitical to the
> health of the lustre client.
>
> Sadly its not just Intel but other vendors that don't directly contribute
> to the linux lustre client. I have spoke to the vendors about contributing
> and they all say the same thing. No working with drivers in the staging
> tree. Sadly all the parties involved are very interested in the success
> of the lustre client. No one has ever told me directly why they don't get
> involved but I suspect it has to deal with 2 reasons. One is that staging
> drivers are not normally enabled by distributions so their clients
> normally will never deal with the staging lustre client. Secondly vendors
> just lack the man power to contribute in a meanful way.

If staging is hurting you, why is it in staging at all? Why not just
drop it, go off and spend a few months to clean up all the issues in
your own tree (with none of those pesky requirements of easy-to-review
patches) and then submit a "clean" filesystem for inclusion in the
"real" part of the kernel tree?

It doesn't sound like anyone is actually using this code in the tree
as-is, so why even keep it here?

thanks,

greg k-h

2018-05-17 05:08:03

by James Simmons

[permalink] [raw]

Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

> > > Anyway, I understand that Intel has been ignoring kernel.org instead of
> > > sending forwarding their patches properly so you're doing a difficult
> > > and thankless job... Thanks for that. I'm sure it's frustrating to
> > > look at these patches for you as well.
> >
> > Thank you for the complement. Also thank you for taking time to review
> > these patches. Your feedback is most welcomed and benefitical to the
> > health of the lustre client.
> >
> > Sadly its not just Intel but other vendors that don't directly contribute
> > to the linux lustre client. I have spoke to the vendors about contributing
> > and they all say the same thing. No working with drivers in the staging
> > tree. Sadly all the parties involved are very interested in the success
> > of the lustre client. No one has ever told me directly why they don't get
> > involved but I suspect it has to deal with 2 reasons. One is that staging
> > drivers are not normally enabled by distributions so their clients
> > normally will never deal with the staging lustre client. Secondly vendors
> > just lack the man power to contribute in a meanful way.
>
> If staging is hurting you, why is it in staging at all? Why not just
> drop it, go off and spend a few months to clean up all the issues in
> your own tree (with none of those pesky requirements of easy-to-review
> patches) and then submit a "clean" filesystem for inclusion in the
> "real" part of the kernel tree?
>
> It doesn't sound like anyone is actually using this code in the tree
> as-is, so why even keep it here?

I never said being in staging is hurting the progression of Lustre. In
fact it is the exact opposite otherwise I wouldn't be active in this work.
What I was pointing out to Dan was that many vendors are reluctant to
partcipate in broader open source development of this type.

The whole point of this is to evolve Lustre into a proper open source
project not dependent on vendors for survival. Several years ago Lustre
changed hands several times and the HPC community was worried about its
survival. Various institutions band togther to raise the resources to
keep it alive. Over time Lustre has been migrating to a more open source
community effort. An awesome example is the work the University of Indiana
did for the sptlrpc layer. Now we see efforts expanding into the realm of
the linux lustre client. Actually HPC sites that are community members are
testing and running the linux client. In spite of the lack of vendor
involvement the linux lustre client is making excellent progress. How
often do you see style patches anymore? The headers are properly split
between userspace UAPI headers and kernel space. One of the major barriers
to leave staging was the the lack of a strong presence to continue moving
the lustre client forward. That is no longer the case. The finish line is
in view.