2018-05-01 03:53:29

by NeilBrown

[permalink] [raw]
Subject: [PATCH 00/10] staging: lustre: assorted improvements.

First 6 patches are clean-up patches that I pulled out
of my rhashtable series. I think these stand alone as
good cleanups, and having them upstream makes the rhashtable
series shorter to ease further review.

Second 2 are revised versions of patches I sent previously that
had conflicts with other patches that landed first.

Last is a bugfix for an issue James mentioned a while back.

Thanks,
NeilBrown


---

NeilBrown (10):
staging: lustre: ldlm: store name directly in namespace.
staging: lustre: make struct lu_site_bkt_data private
staging: lustre: lu_object: discard extra lru count.
staging: lustre: lu_object: move retry logic inside htable_lookup
staging: lustre: fold lu_object_new() into lu_object_find_at()
staging: lustre: llite: use more private data in dump_pgcache
staging: lustre: llite: remove redundant lookup in dump_pgcache
staging: lustre: move misc-device registration closer to related code.
staging: lustre: move remaining code from linux-module.c to module.c
staging: lustre: fix error deref in ll_splice_alias().


.../staging/lustre/include/linux/libcfs/libcfs.h | 6 -
drivers/staging/lustre/lnet/libcfs/Makefile | 1
.../lustre/lnet/libcfs/linux/linux-module.c | 196 --------------------
drivers/staging/lustre/lnet/libcfs/module.c | 162 ++++++++++++++++-
drivers/staging/lustre/lustre/include/lu_object.h | 36 ----
drivers/staging/lustre/lustre/include/lustre_dlm.h | 5 -
drivers/staging/lustre/lustre/ldlm/ldlm_resource.c | 5 +
drivers/staging/lustre/lustre/llite/lcommon_cl.c | 8 -
drivers/staging/lustre/lustre/llite/namei.c | 8 +
drivers/staging/lustre/lustre/llite/vvp_dev.c | 169 ++++++++---------
drivers/staging/lustre/lustre/lov/lov_object.c | 8 -
drivers/staging/lustre/lustre/obdclass/lu_object.c | 169 ++++++++---------
12 files changed, 341 insertions(+), 432 deletions(-)
delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c

--
Signature



2018-05-01 03:53:43

by NeilBrown

[permalink] [raw]
Subject: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.

Rather than storing the name of a namespace in the
hash table, store it directly in the namespace.
This will allow the hashtable to be changed to use
rhashtable.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/include/lustre_dlm.h | 5 ++++-
drivers/staging/lustre/lustre/ldlm/ldlm_resource.c | 5 +++++
2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index d668d86423a4..b3532adac31c 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -362,6 +362,9 @@ struct ldlm_namespace {
/** Flag indicating if namespace is on client instead of server */
enum ldlm_side ns_client;

+ /** name of this namespace */
+ char *ns_name;
+
/** Resource hash table for namespace. */
struct cfs_hash *ns_rs_hash;

@@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock)
static inline char *
ldlm_ns_name(struct ldlm_namespace *ns)
{
- return ns->ns_rs_hash->hs_name;
+ return ns->ns_name;
}

static inline struct ldlm_namespace *
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 6c615b6e9bdc..43bbc5fd94cc 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
ns->ns_obd = obd;
ns->ns_appetite = apt;
ns->ns_client = client;
+ ns->ns_name = kstrdup(name, GFP_KERNEL);
+ if (!ns->ns_name)
+ goto out_hash;

INIT_LIST_HEAD(&ns->ns_list_chain);
INIT_LIST_HEAD(&ns->ns_unused_list);
@@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
ldlm_namespace_sysfs_unregister(ns);
ldlm_namespace_cleanup(ns, 0);
out_hash:
+ kfree(ns->ns_name);
cfs_hash_putref(ns->ns_rs_hash);
out_ns:
kfree(ns);
@@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns)
ldlm_namespace_debugfs_unregister(ns);
ldlm_namespace_sysfs_unregister(ns);
cfs_hash_putref(ns->ns_rs_hash);
+ kfree(ns->ns_name);
/* Namespace \a ns should be not on list at this time, otherwise
* this will cause issues related to using freed \a ns in poold
* thread.



2018-05-01 03:54:04

by NeilBrown

[permalink] [raw]
Subject: [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup

The current retry logic, to wait when a 'dying' object is found,
spans multiple functions. The process is attached to a waitqueue
and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
is passed back through lu_object_find_try() to lu_object_find_at()
where schedule() is called and the process is removed from the queue.

This can be simplified by moving all the logic (including
hashtable locking) inside htable_lookup(), which now never returns
EAGAIN.

Note that htable_lookup() is called with the hash bucket lock
held, and will drop and retake it if it needs to schedule.

I made this a 'goto' loop rather than a 'while(1)' loop as the
diff is easier to read.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++++++-------------
1 file changed, 27 insertions(+), 46 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 2bf089817157..93daa52e2535 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
static struct lu_object *htable_lookup(struct lu_site *s,
struct cfs_hash_bd *bd,
const struct lu_fid *f,
- wait_queue_entry_t *waiter,
__u64 *version)
{
+ struct cfs_hash *hs = s->ls_obj_hash;
struct lu_site_bkt_data *bkt;
struct lu_object_header *h;
struct hlist_node *hnode;
- __u64 ver = cfs_hash_bd_version_get(bd);
+ __u64 ver;
+ wait_queue_entry_t waiter;

- if (*version == ver)
+retry:
+ ver = cfs_hash_bd_version_get(bd);
+
+ if (*version == ver) {
return ERR_PTR(-ENOENT);
+ }

*version = ver;
bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
@@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s,
* drained), and moreover, lookup has to wait until object is freed.
*/

- init_waitqueue_entry(waiter, current);
- add_wait_queue(&bkt->lsb_marche_funebre, waiter);
+ init_waitqueue_entry(&waiter, current);
+ add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
set_current_state(TASK_UNINTERRUPTIBLE);
lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
- return ERR_PTR(-EAGAIN);
+ cfs_hash_bd_unlock(hs, bd, 1);
+ schedule();
+ remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
+ cfs_hash_bd_lock(hs, bd, 1);
+ goto retry;
}

/**
@@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env,
}

/**
- * Core logic of lu_object_find*() functions.
+ * Much like lu_object_find(), but top level device of object is specifically
+ * \a dev rather than top level device of the site. This interface allows
+ * objects of different "stacking" to be created within the same site.
*/
-static struct lu_object *lu_object_find_try(const struct lu_env *env,
- struct lu_device *dev,
- const struct lu_fid *f,
- const struct lu_object_conf *conf,
- wait_queue_entry_t *waiter)
+struct lu_object *lu_object_find_at(const struct lu_env *env,
+ struct lu_device *dev,
+ const struct lu_fid *f,
+ const struct lu_object_conf *conf)
{
struct lu_object *o;
struct lu_object *shadow;
@@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
* It is unnecessary to perform lookup-alloc-lookup-insert, instead,
* just alloc and insert directly.
*
- * If dying object is found during index search, add @waiter to the
- * site wait-queue and return ERR_PTR(-EAGAIN).
*/
if (conf && conf->loc_flags & LOC_F_NEW)
return lu_object_new(env, dev, f, conf);

s = dev->ld_site;
hs = s->ls_obj_hash;
- cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
- o = htable_lookup(s, &bd, f, waiter, &version);
- cfs_hash_bd_unlock(hs, &bd, 1);
+ cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 0);
+ o = htable_lookup(s, &bd, f, &version);
+ cfs_hash_bd_unlock(hs, &bd, 0);
+
if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
return o;

@@ -751,7 +760,7 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,

cfs_hash_bd_lock(hs, &bd, 1);

- shadow = htable_lookup(s, &bd, f, waiter, &version);
+ shadow = htable_lookup(s, &bd, f, &version);
if (likely(PTR_ERR(shadow) == -ENOENT)) {
cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
cfs_hash_bd_unlock(hs, &bd, 1);
@@ -766,34 +775,6 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
lu_object_free(env, o);
return shadow;
}
-
-/**
- * Much like lu_object_find(), but top level device of object is specifically
- * \a dev rather than top level device of the site. This interface allows
- * objects of different "stacking" to be created within the same site.
- */
-struct lu_object *lu_object_find_at(const struct lu_env *env,
- struct lu_device *dev,
- const struct lu_fid *f,
- const struct lu_object_conf *conf)
-{
- wait_queue_head_t *wq;
- struct lu_object *obj;
- wait_queue_entry_t wait;
-
- while (1) {
- obj = lu_object_find_try(env, dev, f, conf, &wait);
- if (obj != ERR_PTR(-EAGAIN))
- return obj;
- /*
- * lu_object_find_try() already added waiter into the
- * wait queue.
- */
- schedule();
- wq = lu_site_wq_from_fid(dev->ld_site, (void *)f);
- remove_wait_queue(wq, &wait);
- }
-}
EXPORT_SYMBOL(lu_object_find_at);

/**



2018-05-01 03:54:23

by NeilBrown

[permalink] [raw]
Subject: [PATCH 05/10] staging: lustre: fold lu_object_new() into lu_object_find_at()

lu_object_new() duplicates a lot of code that is in
lu_object_find_at().
There is no real need for a separate function, it is simpler just
to skip the bits of lu_object_find_at() that we don't
want in the LOC_F_NEW case.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/lu_object.c | 44 +++++---------------
1 file changed, 12 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 93daa52e2535..9721b3af8ea8 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -678,29 +678,6 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev)
false);
}

-static struct lu_object *lu_object_new(const struct lu_env *env,
- struct lu_device *dev,
- const struct lu_fid *f,
- const struct lu_object_conf *conf)
-{
- struct lu_object *o;
- struct cfs_hash *hs;
- struct cfs_hash_bd bd;
-
- o = lu_object_alloc(env, dev, f, conf);
- if (IS_ERR(o))
- return o;
-
- hs = dev->ld_site->ls_obj_hash;
- cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
- cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
- cfs_hash_bd_unlock(hs, &bd, 1);
-
- lu_object_limit(env, dev);
-
- return o;
-}
-
/**
* Much like lu_object_find(), but top level device of object is specifically
* \a dev rather than top level device of the site. This interface allows
@@ -736,18 +713,18 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
* just alloc and insert directly.
*
*/
- if (conf && conf->loc_flags & LOC_F_NEW)
- return lu_object_new(env, dev, f, conf);
-
s = dev->ld_site;
hs = s->ls_obj_hash;
- cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 0);
- o = htable_lookup(s, &bd, f, &version);
- cfs_hash_bd_unlock(hs, &bd, 0);

- if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
- return o;
+ cfs_hash_bd_get(hs, f, &bd);
+ if (!(conf && conf->loc_flags & LOC_F_NEW)) {
+ cfs_hash_bd_lock(hs, &bd, 0);
+ o = htable_lookup(s, &bd, f, &version);
+ cfs_hash_bd_unlock(hs, &bd, 0);

+ if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
+ return o;
+ }
/*
* Allocate new object. This may result in rather complicated
* operations, including fld queries, inode loading, etc.
@@ -760,7 +737,10 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,

cfs_hash_bd_lock(hs, &bd, 1);

- shadow = htable_lookup(s, &bd, f, &version);
+ if (conf && conf->loc_flags & LOC_F_NEW)
+ shadow = ERR_PTR(-ENOENT);
+ else
+ shadow = htable_lookup(s, &bd, f, &version);
if (likely(PTR_ERR(shadow) == -ENOENT)) {
cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
cfs_hash_bd_unlock(hs, &bd, 1);



2018-05-01 03:54:34

by NeilBrown

[permalink] [raw]
Subject: [PATCH 06/10] staging: lustre: llite: use more private data in dump_pgcache

The dump_page_cache debugfs file allocates and frees an 'env' in each
call to vvp_pgcache_start,next,show. This is likely to be fast, but
does introduce the need to check for errors.

It is reasonable to allocate a single 'env' when the file is opened,
and use that throughout.

So create 'seq_private' structure which stores the sbi, env, and
refcheck, and attach this to the seqfile.

Then use it throughout instead of allocating 'env' repeatedly.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/llite/vvp_dev.c | 150 ++++++++++++-------------
1 file changed, 72 insertions(+), 78 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 987c03b058e6..a2619dc04a7f 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -390,6 +390,12 @@ struct vvp_pgcache_id {
struct lu_object_header *vpi_obj;
};

+struct seq_private {
+ struct ll_sb_info *sbi;
+ struct lu_env *env;
+ u16 refcheck;
+};
+
static void vvp_pgcache_id_unpack(loff_t pos, struct vvp_pgcache_id *id)
{
BUILD_BUG_ON(sizeof(pos) != sizeof(__u64));
@@ -531,95 +537,71 @@ static void vvp_pgcache_page_show(const struct lu_env *env,

static int vvp_pgcache_show(struct seq_file *f, void *v)
{
+ struct seq_private *priv = f->private;
loff_t pos;
- struct ll_sb_info *sbi;
struct cl_object *clob;
- struct lu_env *env;
struct vvp_pgcache_id id;
- u16 refcheck;
- int result;

- env = cl_env_get(&refcheck);
- if (!IS_ERR(env)) {
- pos = *(loff_t *)v;
- vvp_pgcache_id_unpack(pos, &id);
- sbi = f->private;
- clob = vvp_pgcache_obj(env, &sbi->ll_cl->cd_lu_dev, &id);
- if (clob) {
- struct inode *inode = vvp_object_inode(clob);
- struct cl_page *page = NULL;
- struct page *vmpage;
-
- result = find_get_pages_contig(inode->i_mapping,
- id.vpi_index, 1,
- &vmpage);
- if (result > 0) {
- lock_page(vmpage);
- page = cl_vmpage_page(vmpage, clob);
- unlock_page(vmpage);
- put_page(vmpage);
- }
+ pos = *(loff_t *)v;
+ vvp_pgcache_id_unpack(pos, &id);
+ clob = vvp_pgcache_obj(priv->env, &priv->sbi->ll_cl->cd_lu_dev, &id);
+ if (clob) {
+ struct inode *inode = vvp_object_inode(clob);
+ struct cl_page *page = NULL;
+ struct page *vmpage;
+ int result;
+
+ result = find_get_pages_contig(inode->i_mapping,
+ id.vpi_index, 1,
+ &vmpage);
+ if (result > 0) {
+ lock_page(vmpage);
+ page = cl_vmpage_page(vmpage, clob);
+ unlock_page(vmpage);
+ put_page(vmpage);
+ }

- seq_printf(f, "%8x@" DFID ": ", id.vpi_index,
- PFID(lu_object_fid(&clob->co_lu)));
- if (page) {
- vvp_pgcache_page_show(env, f, page);
- cl_page_put(env, page);
- } else {
- seq_puts(f, "missing\n");
- }
- lu_object_ref_del(&clob->co_lu, "dump", current);
- cl_object_put(env, clob);
+ seq_printf(f, "%8x@" DFID ": ", id.vpi_index,
+ PFID(lu_object_fid(&clob->co_lu)));
+ if (page) {
+ vvp_pgcache_page_show(priv->env, f, page);
+ cl_page_put(priv->env, page);
} else {
- seq_printf(f, "%llx missing\n", pos);
+ seq_puts(f, "missing\n");
}
- cl_env_put(env, &refcheck);
- result = 0;
+ lu_object_ref_del(&clob->co_lu, "dump", current);
+ cl_object_put(priv->env, clob);
} else {
- result = PTR_ERR(env);
+ seq_printf(f, "%llx missing\n", pos);
}
- return result;
+ return 0;
}

static void *vvp_pgcache_start(struct seq_file *f, loff_t *pos)
{
- struct ll_sb_info *sbi;
- struct lu_env *env;
- u16 refcheck;
-
- sbi = f->private;
+ struct seq_private *priv = f->private;

- env = cl_env_get(&refcheck);
- if (!IS_ERR(env)) {
- sbi = f->private;
- if (sbi->ll_site->ls_obj_hash->hs_cur_bits >
- 64 - PGC_OBJ_SHIFT) {
- pos = ERR_PTR(-EFBIG);
- } else {
- *pos = vvp_pgcache_find(env, &sbi->ll_cl->cd_lu_dev,
- *pos);
- if (*pos == ~0ULL)
- pos = NULL;
- }
- cl_env_put(env, &refcheck);
+ if (priv->sbi->ll_site->ls_obj_hash->hs_cur_bits >
+ 64 - PGC_OBJ_SHIFT) {
+ pos = ERR_PTR(-EFBIG);
+ } else {
+ *pos = vvp_pgcache_find(priv->env, &priv->sbi->ll_cl->cd_lu_dev,
+ *pos);
+ if (*pos == ~0ULL)
+ pos = NULL;
}
+
return pos;
}

static void *vvp_pgcache_next(struct seq_file *f, void *v, loff_t *pos)
{
- struct ll_sb_info *sbi;
- struct lu_env *env;
- u16 refcheck;
+ struct seq_private *priv = f->private;
+
+ *pos = vvp_pgcache_find(priv->env, &priv->sbi->ll_cl->cd_lu_dev, *pos + 1);
+ if (*pos == ~0ULL)
+ pos = NULL;

- env = cl_env_get(&refcheck);
- if (!IS_ERR(env)) {
- sbi = f->private;
- *pos = vvp_pgcache_find(env, &sbi->ll_cl->cd_lu_dev, *pos + 1);
- if (*pos == ~0ULL)
- pos = NULL;
- cl_env_put(env, &refcheck);
- }
return pos;
}

@@ -637,17 +619,29 @@ static const struct seq_operations vvp_pgcache_ops = {

static int vvp_dump_pgcache_seq_open(struct inode *inode, struct file *filp)
{
- struct seq_file *seq;
- int rc;
-
- rc = seq_open(filp, &vvp_pgcache_ops);
- if (rc)
- return rc;
+ struct seq_private *priv;
+
+ priv = __seq_open_private(filp, &vvp_pgcache_ops, sizeof(*priv));
+ if (!priv)
+ return -ENOMEM;
+
+ priv->sbi = inode->i_private;
+ priv->env = cl_env_get(&priv->refcheck);
+ if (IS_ERR(priv->env)) {
+ int err = PTR_ERR(priv->env);
+ seq_release_private(inode, filp);
+ return err;
+ }
+ return 0;
+}

- seq = filp->private_data;
- seq->private = inode->i_private;
+static int vvp_dump_pgcache_seq_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *seq = file->private_data;
+ struct seq_private *priv = seq->private;

- return 0;
+ cl_env_put(priv->env, &priv->refcheck);
+ return seq_release_private(inode, file);
}

const struct file_operations vvp_dump_pgcache_file_ops = {
@@ -655,5 +649,5 @@ const struct file_operations vvp_dump_pgcache_file_ops = {
.open = vvp_dump_pgcache_seq_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = vvp_dump_pgcache_seq_release,
};



2018-05-01 03:54:46

by NeilBrown

[permalink] [raw]
Subject: [PATCH 08/10] staging: lustre: move misc-device registration closer to related code.

The ioctl handler for the misc device is in lnet/libcfs/module.c
but is it registered in lnet/libcfs/linux/linux-module.c.

Keeping related code together make maintenance easier, so move the
code.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/include/linux/libcfs/libcfs.h | 2 -
.../lustre/lnet/libcfs/linux/linux-module.c | 28 ------------------
drivers/staging/lustre/lnet/libcfs/module.c | 31 +++++++++++++++++++-
3 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index 6e7754b2f296..9263e151451b 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -141,11 +141,9 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand);
int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
const struct libcfs_ioctl_hdr __user *uparam);
int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data);
-int libcfs_ioctl(unsigned long cmd, void __user *arg);

#define _LIBCFS_H

-extern struct miscdevice libcfs_dev;
/**
* The path of debug log dump upcall script.
*/
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
index c8908e816c4c..954b681f9db7 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
@@ -166,31 +166,3 @@ int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
kvfree(*hdr_pp);
return err;
}
-
-static long
-libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
-
- if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
- _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
- _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
- CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
- _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
- return -EINVAL;
- }
-
- return libcfs_ioctl(cmd, (void __user *)arg);
-}
-
-static const struct file_operations libcfs_fops = {
- .owner = THIS_MODULE,
- .unlocked_ioctl = libcfs_psdev_ioctl,
-};
-
-struct miscdevice libcfs_dev = {
- .minor = MISC_DYNAMIC_MINOR,
- .name = "lnet",
- .fops = &libcfs_fops,
-};
diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
index 4b9acd7bc5cf..3fb150a57f49 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -95,7 +95,7 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand)
}
EXPORT_SYMBOL(libcfs_deregister_ioctl);

-int libcfs_ioctl(unsigned long cmd, void __user *uparam)
+static int libcfs_ioctl(unsigned long cmd, void __user *uparam)
{
struct libcfs_ioctl_data *data = NULL;
struct libcfs_ioctl_hdr *hdr;
@@ -161,6 +161,35 @@ int libcfs_ioctl(unsigned long cmd, void __user *uparam)
return err;
}

+
+static long
+libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
+ _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
+ _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
+ CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
+ _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
+ return -EINVAL;
+ }
+
+ return libcfs_ioctl(cmd, (void __user *)arg);
+}
+
+static const struct file_operations libcfs_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = libcfs_psdev_ioctl,
+};
+
+struct miscdevice libcfs_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "lnet",
+ .fops = &libcfs_fops,
+};
+
int lprocfs_call_handler(void *data, int write, loff_t *ppos,
void __user *buffer, size_t *lenp,
int (*handler)(void *data, int write, loff_t pos,



2018-05-01 04:04:53

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.

On Apr 30, 2018, at 21:52, NeilBrown <[email protected]> wrote:
>
> Rather than storing the name of a namespace in the
> hash table, store it directly in the namespace.
> This will allow the hashtable to be changed to use
> rhashtable.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/include/lustre_dlm.h | 5 ++++-
> drivers/staging/lustre/lustre/ldlm/ldlm_resource.c | 5 +++++
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> index d668d86423a4..b3532adac31c 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> @@ -362,6 +362,9 @@ struct ldlm_namespace {
> /** Flag indicating if namespace is on client instead of server */
> enum ldlm_side ns_client;
>
> + /** name of this namespace */
> + char *ns_name;
> +
> /** Resource hash table for namespace. */
> struct cfs_hash *ns_rs_hash;
>
> @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock)
> static inline char *
> ldlm_ns_name(struct ldlm_namespace *ns)
> {
> - return ns->ns_rs_hash->hs_name;
> + return ns->ns_name;
> }
>
> static inline struct ldlm_namespace *
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> index 6c615b6e9bdc..43bbc5fd94cc 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
> ns->ns_obd = obd;
> ns->ns_appetite = apt;
> ns->ns_client = client;
> + ns->ns_name = kstrdup(name, GFP_KERNEL);
> + if (!ns->ns_name)
> + goto out_hash;
>
> INIT_LIST_HEAD(&ns->ns_list_chain);
> INIT_LIST_HEAD(&ns->ns_unused_list);
> @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
> ldlm_namespace_sysfs_unregister(ns);
> ldlm_namespace_cleanup(ns, 0);
> out_hash:
> + kfree(ns->ns_name);
> cfs_hash_putref(ns->ns_rs_hash);
> out_ns:
> kfree(ns);
> @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns)
> ldlm_namespace_debugfs_unregister(ns);
> ldlm_namespace_sysfs_unregister(ns);
> cfs_hash_putref(ns->ns_rs_hash);
> + kfree(ns->ns_name);
> /* Namespace \a ns should be not on list at this time, otherwise
> * this will cause issues related to using freed \a ns in poold
> * thread.
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-05-01 08:23:43

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup

On Apr 30, 2018, at 21:52, NeilBrown <[email protected]> wrote:
>
> The current retry logic, to wait when a 'dying' object is found,
> spans multiple functions. The process is attached to a waitqueue
> and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
> is passed back through lu_object_find_try() to lu_object_find_at()
> where schedule() is called and the process is removed from the queue.
>
> This can be simplified by moving all the logic (including
> hashtable locking) inside htable_lookup(), which now never returns
> EAGAIN.
>
> Note that htable_lookup() is called with the hash bucket lock
> held, and will drop and retake it if it needs to schedule.
>
> I made this a 'goto' loop rather than a 'while(1)' loop as the
> diff is easier to read.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++++++-------------
> 1 file changed, 27 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index 2bf089817157..93daa52e2535 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
> static struct lu_object *htable_lookup(struct lu_site *s,

It's probably a good idea to add a comment for this function that it may
drop and re-acquire the hash bucket lock internally.

> struct cfs_hash_bd *bd,
> const struct lu_fid *f,
> - wait_queue_entry_t *waiter,
> __u64 *version)
> {
> + struct cfs_hash *hs = s->ls_obj_hash;
> struct lu_site_bkt_data *bkt;
> struct lu_object_header *h;
> struct hlist_node *hnode;
> - __u64 ver = cfs_hash_bd_version_get(bd);
> + __u64 ver;
> + wait_queue_entry_t waiter;
>
> - if (*version == ver)
> +retry:
> + ver = cfs_hash_bd_version_get(bd);
> +
> + if (*version == ver) {
> return ERR_PTR(-ENOENT);
> + }

(style) we don't need the {} around a single-line if statement

> *version = ver;
> bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
> @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s,
> * drained), and moreover, lookup has to wait until object is freed.
> */
>
> - init_waitqueue_entry(waiter, current);
> - add_wait_queue(&bkt->lsb_marche_funebre, waiter);
> + init_waitqueue_entry(&waiter, current);
> + add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
> set_current_state(TASK_UNINTERRUPTIBLE);
> lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
> - return ERR_PTR(-EAGAIN);
> + cfs_hash_bd_unlock(hs, bd, 1);

This looks like it isn't unlocking and locking the hash bucket in the same
manner that it was done in the caller. Here excl = 1, but in the caller
you changed it to excl = 0?

> + schedule();
> + remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);

Is it worthwhile to use your new helper function here to get the wq from "s"?

> + cfs_hash_bd_lock(hs, bd, 1);
> + goto retry;
> }
>
> /**
> @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env,
> }
>
> /**
> - * Core logic of lu_object_find*() functions.
> + * Much like lu_object_find(), but top level device of object is specifically
> + * \a dev rather than top level device of the site. This interface allows
> + * objects of different "stacking" to be created within the same site.
> */
> -static struct lu_object *lu_object_find_try(const struct lu_env *env,
> - struct lu_device *dev,
> - const struct lu_fid *f,
> - const struct lu_object_conf *conf,
> - wait_queue_entry_t *waiter)
> +struct lu_object *lu_object_find_at(const struct lu_env *env,
> + struct lu_device *dev,
> + const struct lu_fid *f,
> + const struct lu_object_conf *conf)
> {
> struct lu_object *o;
> struct lu_object *shadow;
> @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
> * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
> * just alloc and insert directly.
> *
> - * If dying object is found during index search, add @waiter to the
> - * site wait-queue and return ERR_PTR(-EAGAIN).
> */
> if (conf && conf->loc_flags & LOC_F_NEW)
> return lu_object_new(env, dev, f, conf);
>
> s = dev->ld_site;
> hs = s->ls_obj_hash;
> - cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
> - o = htable_lookup(s, &bd, f, waiter, &version);
> - cfs_hash_bd_unlock(hs, &bd, 1);
> + cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 0);
> + o = htable_lookup(s, &bd, f, &version);
> + cfs_hash_bd_unlock(hs, &bd, 0);

Here you changed the locking to a non-exclusive (read) lock instead of an
exclusive (write) lock? Why.

> +
> if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
> return o;
>
> @@ -751,7 +760,7 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
>
> cfs_hash_bd_lock(hs, &bd, 1);
>
> - shadow = htable_lookup(s, &bd, f, waiter, &version);
> + shadow = htable_lookup(s, &bd, f, &version);
> if (likely(PTR_ERR(shadow) == -ENOENT)) {
> cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
> cfs_hash_bd_unlock(hs, &bd, 1);
> @@ -766,34 +775,6 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
> lu_object_free(env, o);
> return shadow;
> }
> -
> -/**
> - * Much like lu_object_find(), but top level device of object is specifically
> - * \a dev rather than top level device of the site. This interface allows
> - * objects of different "stacking" to be created within the same site.
> - */
> -struct lu_object *lu_object_find_at(const struct lu_env *env,
> - struct lu_device *dev,
> - const struct lu_fid *f,
> - const struct lu_object_conf *conf)
> -{
> - wait_queue_head_t *wq;
> - struct lu_object *obj;
> - wait_queue_entry_t wait;
> -
> - while (1) {
> - obj = lu_object_find_try(env, dev, f, conf, &wait);
> - if (obj != ERR_PTR(-EAGAIN))
> - return obj;
> - /*
> - * lu_object_find_try() already added waiter into the
> - * wait queue.
> - */
> - schedule();
> - wq = lu_site_wq_from_fid(dev->ld_site, (void *)f);
> - remove_wait_queue(wq, &wait);
> - }
> -}
> EXPORT_SYMBOL(lu_object_find_at);
>
> /**
>
>
> _______________________________________________
> lustre-devel mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-05-02 18:13:01

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.


> Rather than storing the name of a namespace in the
> hash table, store it directly in the namespace.
> This will allow the hashtable to be changed to use
> rhashtable.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: James Simmons <[email protected]>

> ---
> drivers/staging/lustre/lustre/include/lustre_dlm.h | 5 ++++-
> drivers/staging/lustre/lustre/ldlm/ldlm_resource.c | 5 +++++
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> index d668d86423a4..b3532adac31c 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> @@ -362,6 +362,9 @@ struct ldlm_namespace {
> /** Flag indicating if namespace is on client instead of server */
> enum ldlm_side ns_client;
>
> + /** name of this namespace */
> + char *ns_name;
> +
> /** Resource hash table for namespace. */
> struct cfs_hash *ns_rs_hash;
>
> @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock)
> static inline char *
> ldlm_ns_name(struct ldlm_namespace *ns)
> {
> - return ns->ns_rs_hash->hs_name;
> + return ns->ns_name;
> }
>
> static inline struct ldlm_namespace *
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> index 6c615b6e9bdc..43bbc5fd94cc 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
> ns->ns_obd = obd;
> ns->ns_appetite = apt;
> ns->ns_client = client;
> + ns->ns_name = kstrdup(name, GFP_KERNEL);
> + if (!ns->ns_name)
> + goto out_hash;
>
> INIT_LIST_HEAD(&ns->ns_list_chain);
> INIT_LIST_HEAD(&ns->ns_unused_list);
> @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
> ldlm_namespace_sysfs_unregister(ns);
> ldlm_namespace_cleanup(ns, 0);
> out_hash:
> + kfree(ns->ns_name);
> cfs_hash_putref(ns->ns_rs_hash);
> out_ns:
> kfree(ns);
> @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns)
> ldlm_namespace_debugfs_unregister(ns);
> ldlm_namespace_sysfs_unregister(ns);
> cfs_hash_putref(ns->ns_rs_hash);
> + kfree(ns->ns_name);
> /* Namespace \a ns should be not on list at this time, otherwise
> * this will cause issues related to using freed \a ns in poold
> * thread.
>
>
>

2018-05-02 18:18:08

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH 08/10] staging: lustre: move misc-device registration closer to related code.


> The ioctl handler for the misc device is in lnet/libcfs/module.c
> but is it registered in lnet/libcfs/linux/linux-module.c.
>
> Keeping related code together make maintenance easier, so move the
> code.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: James Simmons <[email protected]>

> ---
> .../staging/lustre/include/linux/libcfs/libcfs.h | 2 -
> .../lustre/lnet/libcfs/linux/linux-module.c | 28 ------------------
> drivers/staging/lustre/lnet/libcfs/module.c | 31 +++++++++++++++++++-
> 3 files changed, 30 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index 6e7754b2f296..9263e151451b 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -141,11 +141,9 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand);
> int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
> const struct libcfs_ioctl_hdr __user *uparam);
> int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data);
> -int libcfs_ioctl(unsigned long cmd, void __user *arg);
>
> #define _LIBCFS_H
>
> -extern struct miscdevice libcfs_dev;
> /**
> * The path of debug log dump upcall script.
> */
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
> index c8908e816c4c..954b681f9db7 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
> @@ -166,31 +166,3 @@ int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
> kvfree(*hdr_pp);
> return err;
> }
> -
> -static long
> -libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> -{
> - if (!capable(CAP_SYS_ADMIN))
> - return -EACCES;
> -
> - if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
> - _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
> - _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
> - CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
> - _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
> - return -EINVAL;
> - }
> -
> - return libcfs_ioctl(cmd, (void __user *)arg);
> -}
> -
> -static const struct file_operations libcfs_fops = {
> - .owner = THIS_MODULE,
> - .unlocked_ioctl = libcfs_psdev_ioctl,
> -};
> -
> -struct miscdevice libcfs_dev = {
> - .minor = MISC_DYNAMIC_MINOR,
> - .name = "lnet",
> - .fops = &libcfs_fops,
> -};
> diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
> index 4b9acd7bc5cf..3fb150a57f49 100644
> --- a/drivers/staging/lustre/lnet/libcfs/module.c
> +++ b/drivers/staging/lustre/lnet/libcfs/module.c
> @@ -95,7 +95,7 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand)
> }
> EXPORT_SYMBOL(libcfs_deregister_ioctl);
>
> -int libcfs_ioctl(unsigned long cmd, void __user *uparam)
> +static int libcfs_ioctl(unsigned long cmd, void __user *uparam)
> {
> struct libcfs_ioctl_data *data = NULL;
> struct libcfs_ioctl_hdr *hdr;
> @@ -161,6 +161,35 @@ int libcfs_ioctl(unsigned long cmd, void __user *uparam)
> return err;
> }
>
> +
> +static long
> +libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> + if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
> + _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
> + _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
> + CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
> + _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
> + return -EINVAL;
> + }
> +
> + return libcfs_ioctl(cmd, (void __user *)arg);
> +}
> +
> +static const struct file_operations libcfs_fops = {
> + .owner = THIS_MODULE,
> + .unlocked_ioctl = libcfs_psdev_ioctl,
> +};
> +
> +struct miscdevice libcfs_dev = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "lnet",
> + .fops = &libcfs_fops,
> +};
> +
> int lprocfs_call_handler(void *data, int write, loff_t *ppos,
> void __user *buffer, size_t *lenp,
> int (*handler)(void *data, int write, loff_t pos,
>
>
>

2018-05-02 18:21:55

by James Simmons

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup


> On Apr 30, 2018, at 21:52, NeilBrown <[email protected]> wrote:
> >
> > The current retry logic, to wait when a 'dying' object is found,
> > spans multiple functions. The process is attached to a waitqueue
> > and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
> > is passed back through lu_object_find_try() to lu_object_find_at()
> > where schedule() is called and the process is removed from the queue.
> >
> > This can be simplified by moving all the logic (including
> > hashtable locking) inside htable_lookup(), which now never returns
> > EAGAIN.
> >
> > Note that htable_lookup() is called with the hash bucket lock
> > held, and will drop and retake it if it needs to schedule.
> >
> > I made this a 'goto' loop rather than a 'while(1)' loop as the
> > diff is easier to read.
> >
> > Signed-off-by: NeilBrown <[email protected]>
> > ---
> > drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++++++-------------
> > 1 file changed, 27 insertions(+), 46 deletions(-)
> >
> > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> > index 2bf089817157..93daa52e2535 100644
> > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> > @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
> > static struct lu_object *htable_lookup(struct lu_site *s,
>
> It's probably a good idea to add a comment for this function that it may
> drop and re-acquire the hash bucket lock internally.
>
> > struct cfs_hash_bd *bd,
> > const struct lu_fid *f,
> > - wait_queue_entry_t *waiter,
> > __u64 *version)
> > {
> > + struct cfs_hash *hs = s->ls_obj_hash;
> > struct lu_site_bkt_data *bkt;
> > struct lu_object_header *h;
> > struct hlist_node *hnode;
> > - __u64 ver = cfs_hash_bd_version_get(bd);
> > + __u64 ver;
> > + wait_queue_entry_t waiter;
> >
> > - if (*version == ver)
> > +retry:
> > + ver = cfs_hash_bd_version_get(bd);
> > +
> > + if (*version == ver) {
> > return ERR_PTR(-ENOENT);
> > + }
>
> (style) we don't need the {} around a single-line if statement

I hate to be that guy but could you run checkpatch on your patches.

> > *version = ver;
> > bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
> > @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s,
> > * drained), and moreover, lookup has to wait until object is freed.
> > */
> >
> > - init_waitqueue_entry(waiter, current);
> > - add_wait_queue(&bkt->lsb_marche_funebre, waiter);
> > + init_waitqueue_entry(&waiter, current);
> > + add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
> > set_current_state(TASK_UNINTERRUPTIBLE);
> > lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
> > - return ERR_PTR(-EAGAIN);
> > + cfs_hash_bd_unlock(hs, bd, 1);
>
> This looks like it isn't unlocking and locking the hash bucket in the same
> manner that it was done in the caller. Here excl = 1, but in the caller
> you changed it to excl = 0?

This is very much like the work done by Lai. The difference is Lai remove
the work queue handling complete in htable_lookup(). You can see the
details at https://jira.hpdd.intel.com/browse/LU-9049. I will push the
missing lu_object fixes including LU-9049 on top of your patch set so you
can see the approach Lai did. Form their we can figure out merge the
lu_object work and fixing the issues Andreas and I pointed out.

> > + schedule();
> > + remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
>
> Is it worthwhile to use your new helper function here to get the wq from "s"?
>
> > + cfs_hash_bd_lock(hs, bd, 1);
> > + goto retry;
> > }
> >
> > /**
> > @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env,
> > }
> >
> > /**
> > - * Core logic of lu_object_find*() functions.
> > + * Much like lu_object_find(), but top level device of object is specifically
> > + * \a dev rather than top level device of the site. This interface allows
> > + * objects of different "stacking" to be created within the same site.
> > */
> > -static struct lu_object *lu_object_find_try(const struct lu_env *env,
> > - struct lu_device *dev,
> > - const struct lu_fid *f,
> > - const struct lu_object_conf *conf,
> > - wait_queue_entry_t *waiter)
> > +struct lu_object *lu_object_find_at(const struct lu_env *env,
> > + struct lu_device *dev,
> > + const struct lu_fid *f,
> > + const struct lu_object_conf *conf)
> > {
> > struct lu_object *o;
> > struct lu_object *shadow;
> > @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
> > * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
> > * just alloc and insert directly.
> > *
> > - * If dying object is found during index search, add @waiter to the
> > - * site wait-queue and return ERR_PTR(-EAGAIN).
> > */
> > if (conf && conf->loc_flags & LOC_F_NEW)
> > return lu_object_new(env, dev, f, conf);
> >
> > s = dev->ld_site;
> > hs = s->ls_obj_hash;
> > - cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
> > - o = htable_lookup(s, &bd, f, waiter, &version);
> > - cfs_hash_bd_unlock(hs, &bd, 1);
> > + cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 0);
> > + o = htable_lookup(s, &bd, f, &version);
> > + cfs_hash_bd_unlock(hs, &bd, 0);
>
> Here you changed the locking to a non-exclusive (read) lock instead of an
> exclusive (write) lock? Why.

I have the same question.

>
> > +
> > if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT)
> > return o;
> >
> > @@ -751,7 +760,7 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
> >
> > cfs_hash_bd_lock(hs, &bd, 1);
> >
> > - shadow = htable_lookup(s, &bd, f, waiter, &version);
> > + shadow = htable_lookup(s, &bd, f, &version);
> > if (likely(PTR_ERR(shadow) == -ENOENT)) {
> > cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
> > cfs_hash_bd_unlock(hs, &bd, 1);
> > @@ -766,34 +775,6 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
> > lu_object_free(env, o);
> > return shadow;
> > }
> > -
> > -/**
> > - * Much like lu_object_find(), but top level device of object is specifically
> > - * \a dev rather than top level device of the site. This interface allows
> > - * objects of different "stacking" to be created within the same site.
> > - */
> > -struct lu_object *lu_object_find_at(const struct lu_env *env,
> > - struct lu_device *dev,
> > - const struct lu_fid *f,
> > - const struct lu_object_conf *conf)
> > -{
> > - wait_queue_head_t *wq;
> > - struct lu_object *obj;
> > - wait_queue_entry_t wait;
> > -
> > - while (1) {
> > - obj = lu_object_find_try(env, dev, f, conf, &wait);
> > - if (obj != ERR_PTR(-EAGAIN))
> > - return obj;
> > - /*
> > - * lu_object_find_try() already added waiter into the
> > - * wait queue.
> > - */
> > - schedule();
> > - wq = lu_site_wq_from_fid(dev->ld_site, (void *)f);
> > - remove_wait_queue(wq, &wait);
> > - }
> > -}
> > EXPORT_SYMBOL(lu_object_find_at);
> >
> > /**
> >
> >
> > _______________________________________________
> > lustre-devel mailing list
> > [email protected]
> > http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
>

2018-05-04 00:31:40

by NeilBrown

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup

On Wed, May 02 2018, James Simmons wrote:

>> On Apr 30, 2018, at 21:52, NeilBrown <[email protected]> wrote:
>> >
>> > The current retry logic, to wait when a 'dying' object is found,
>> > spans multiple functions. The process is attached to a waitqueue
>> > and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
>> > is passed back through lu_object_find_try() to lu_object_find_at()
>> > where schedule() is called and the process is removed from the queue.
>> >
>> > This can be simplified by moving all the logic (including
>> > hashtable locking) inside htable_lookup(), which now never returns
>> > EAGAIN.
>> >
>> > Note that htable_lookup() is called with the hash bucket lock
>> > held, and will drop and retake it if it needs to schedule.
>> >
>> > I made this a 'goto' loop rather than a 'while(1)' loop as the
>> > diff is easier to read.
>> >
>> > Signed-off-by: NeilBrown <[email protected]>
>> > ---
>> > drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++++++-------------
>> > 1 file changed, 27 insertions(+), 46 deletions(-)
>> >
>> > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> > index 2bf089817157..93daa52e2535 100644
>> > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> > @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
>> > static struct lu_object *htable_lookup(struct lu_site *s,
>>
>> It's probably a good idea to add a comment for this function that it may
>> drop and re-acquire the hash bucket lock internally.
>>
>> > struct cfs_hash_bd *bd,
>> > const struct lu_fid *f,
>> > - wait_queue_entry_t *waiter,
>> > __u64 *version)
>> > {
>> > + struct cfs_hash *hs = s->ls_obj_hash;
>> > struct lu_site_bkt_data *bkt;
>> > struct lu_object_header *h;
>> > struct hlist_node *hnode;
>> > - __u64 ver = cfs_hash_bd_version_get(bd);
>> > + __u64 ver;
>> > + wait_queue_entry_t waiter;
>> >
>> > - if (*version == ver)
>> > +retry:
>> > + ver = cfs_hash_bd_version_get(bd);
>> > +
>> > + if (*version == ver) {
>> > return ERR_PTR(-ENOENT);
>> > + }
>>
>> (style) we don't need the {} around a single-line if statement
>
> I hate to be that guy but could you run checkpatch on your patches.
>

Someone's got to be "that guy" - thanks.
I have (at last) modified my patch-preparation script to run checkpatch
and show me all the errors that I'm about to post.


>> > *version = ver;
>> > bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
>> > @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s,
>> > * drained), and moreover, lookup has to wait until object is freed.
>> > */
>> >
>> > - init_waitqueue_entry(waiter, current);
>> > - add_wait_queue(&bkt->lsb_marche_funebre, waiter);
>> > + init_waitqueue_entry(&waiter, current);
>> > + add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
>> > set_current_state(TASK_UNINTERRUPTIBLE);
>> > lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
>> > - return ERR_PTR(-EAGAIN);
>> > + cfs_hash_bd_unlock(hs, bd, 1);
>>
>> This looks like it isn't unlocking and locking the hash bucket in the same
>> manner that it was done in the caller. Here excl = 1, but in the caller
>> you changed it to excl = 0?
>
> This is very much like the work done by Lai. The difference is Lai remove
> the work queue handling complete in htable_lookup(). You can see the
> details at https://jira.hpdd.intel.com/browse/LU-9049. I will push the
> missing lu_object fixes including LU-9049 on top of your patch set so you
> can see the approach Lai did. Form their we can figure out merge the
> lu_object work and fixing the issues Andreas and I pointed out.

I think I did see that before but didn't feel I understood it enough to
do anything with, so I deferred it. Having the patches that you
provided, I think it is starting the make more sense. Once I resubmit
this current series I'll have a closer look. Probably we can just
apply the series you sent on top of mine - I might even combine the two
- and the think about whatever else needs doing.

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2018-05-04 01:31:10

by NeilBrown

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup

On Tue, May 01 2018, Dilger, Andreas wrote:

> On Apr 30, 2018, at 21:52, NeilBrown <[email protected]> wrote:
>>
>> The current retry logic, to wait when a 'dying' object is found,
>> spans multiple functions. The process is attached to a waitqueue
>> and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
>> is passed back through lu_object_find_try() to lu_object_find_at()
>> where schedule() is called and the process is removed from the queue.
>>
>> This can be simplified by moving all the logic (including
>> hashtable locking) inside htable_lookup(), which now never returns
>> EAGAIN.
>>
>> Note that htable_lookup() is called with the hash bucket lock
>> held, and will drop and retake it if it needs to schedule.
>>
>> I made this a 'goto' loop rather than a 'while(1)' loop as the
>> diff is easier to read.
>>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++++++-------------
>> 1 file changed, 27 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> index 2bf089817157..93daa52e2535 100644
>> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
>> @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
>> static struct lu_object *htable_lookup(struct lu_site *s,
>
> It's probably a good idea to add a comment for this function that it may
> drop and re-acquire the hash bucket lock internally.

Added - thanks.

>
>> struct cfs_hash_bd *bd,
>> const struct lu_fid *f,
>> - wait_queue_entry_t *waiter,
>> __u64 *version)
>> {
>> + struct cfs_hash *hs = s->ls_obj_hash;
>> struct lu_site_bkt_data *bkt;
>> struct lu_object_header *h;
>> struct hlist_node *hnode;
>> - __u64 ver = cfs_hash_bd_version_get(bd);
>> + __u64 ver;
>> + wait_queue_entry_t waiter;
>>
>> - if (*version == ver)
>> +retry:
>> + ver = cfs_hash_bd_version_get(bd);
>> +
>> + if (*version == ver) {
>> return ERR_PTR(-ENOENT);
>> + }
>
> (style) we don't need the {} around a single-line if statement

Fixed.

>
>> *version = ver;
>> bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
>> @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s,
>> * drained), and moreover, lookup has to wait until object is freed.
>> */
>>
>> - init_waitqueue_entry(waiter, current);
>> - add_wait_queue(&bkt->lsb_marche_funebre, waiter);
>> + init_waitqueue_entry(&waiter, current);
>> + add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
>> set_current_state(TASK_UNINTERRUPTIBLE);
>> lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
>> - return ERR_PTR(-EAGAIN);
>> + cfs_hash_bd_unlock(hs, bd, 1);
>
> This looks like it isn't unlocking and locking the hash bucket in the same
> manner that it was done in the caller. Here excl = 1, but in the caller
> you changed it to excl = 0?

Don't know what happened there... though as the tables is created with
CFS_HASH_SPIN_BLKLOCK it doesn't make any behavioral difference.
I've put it back to use '1' uniformly.

>
>> + schedule();
>> + remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
>
> Is it worthwhile to use your new helper function here to get the wq from "s"?

I don't think so. We already have the 'bkt' and it seems pointless to
compute the hash a second time and use it to find the bucket and then
the queue, just to use a nice wrapper function.


>
>> + cfs_hash_bd_lock(hs, bd, 1);
>> + goto retry;
>> }
>>
>> /**
>> @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env,
>> }
>>
>> /**
>> - * Core logic of lu_object_find*() functions.
>> + * Much like lu_object_find(), but top level device of object is specifically
>> + * \a dev rather than top level device of the site. This interface allows
>> + * objects of different "stacking" to be created within the same site.
>> */
>> -static struct lu_object *lu_object_find_try(const struct lu_env *env,
>> - struct lu_device *dev,
>> - const struct lu_fid *f,
>> - const struct lu_object_conf *conf,
>> - wait_queue_entry_t *waiter)
>> +struct lu_object *lu_object_find_at(const struct lu_env *env,
>> + struct lu_device *dev,
>> + const struct lu_fid *f,
>> + const struct lu_object_conf *conf)
>> {
>> struct lu_object *o;
>> struct lu_object *shadow;
>> @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
>> * It is unnecessary to perform lookup-alloc-lookup-insert, instead,
>> * just alloc and insert directly.
>> *
>> - * If dying object is found during index search, add @waiter to the
>> - * site wait-queue and return ERR_PTR(-EAGAIN).
>> */
>> if (conf && conf->loc_flags & LOC_F_NEW)
>> return lu_object_new(env, dev, f, conf);
>>
>> s = dev->ld_site;
>> hs = s->ls_obj_hash;
>> - cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
>> - o = htable_lookup(s, &bd, f, waiter, &version);
>> - cfs_hash_bd_unlock(hs, &bd, 1);
>> + cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 0);
>> + o = htable_lookup(s, &bd, f, &version);
>> + cfs_hash_bd_unlock(hs, &bd, 0);
>
> Here you changed the locking to a non-exclusive (read) lock instead of an
> exclusive (write) lock? Why.

Carelessness is all. And the fact that my thinking was focused on
rhashtable which doesn't make that distinction. Fixed.

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)