2018-03-01 23:34:37

by NeilBrown

[permalink] [raw]
Subject: [PATCH 00/17] staging: remove requirement that lustre be built as module

The main focus on this series is to remove the requirement that
lustre be built as a module. The involves moving some initialization
out of the module_init functions and putting it elsewhere.

This lead me to look at various kthreads that are used, resulting is
some of them being changed to simpler work-queues.

Lustre has a 'struct ptlrpc_thread' data structure which appears to
have been created to manage the multiple threads for ptlrpcd - and it
does this quite effectively.
It was also used for managing a few other threads, and it was much
less suitable for these. Those which haven't been changed to
workqueues have been changed to use kthread interfaces directly.

There are also a few bug-fixes and minor clean-ups.

This series doesn't introduce any new failures in the 'sanity' test
suite.

Thanks,
NeilBrown


---

NeilBrown (17):
staging: lustre: obd_mount: use correct niduuid suffix.
staging: lustre: fix bug in osc_enter_cache_try
staging: lustre: statahead: remove incorrect test on agl_list_empty()
staging: lustre: obdclass: don't require lct_owner to be non-NULL.
staging: lustre: lnet: keep ln_nportals consistent
staging: lustre: get entropy from nid when nid set.
staging: lustre: ptlrpc: change GFP_NOFS to GFP_KERNEL
staging: lustre: obdclass: use workqueue for zombie management.
staging: lustre: ldlm: use delayed_work for pools_recalc
staging: lustre: ptlrpc: use delayed_work in sec_gc
staging: lustre: ptlrpc: use workqueue for pinger
staging: lustre: remove unused flag from ptlrpc_thread
staging: lustre: remove 'ptlrpc_thread usage' for sai_agl_thread
staging: lustre: change sai_thread to sai_task.
staging: lustre: ptlrpc: move thread creation out of module initialization
staging: lustre: allow monolithic builds
Revert "staging: Disable lustre file system for MIPS, SH, and XTENSA"


drivers/staging/lustre/lnet/Kconfig | 2
drivers/staging/lustre/lnet/lnet/api-ni.c | 7 +
drivers/staging/lustre/lnet/lnet/lib-ptl.c | 5 -
drivers/staging/lustre/lustre/Kconfig | 1
.../staging/lustre/lustre/include/lustre_export.h | 2
.../staging/lustre/lustre/include/lustre_import.h | 4
drivers/staging/lustre/lustre/include/lustre_net.h | 14 -
drivers/staging/lustre/lustre/include/obd.h | 2
drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 12 +
drivers/staging/lustre/lustre/ldlm/ldlm_pool.c | 99 +---------
.../staging/lustre/lustre/llite/llite_internal.h | 4
drivers/staging/lustre/lustre/llite/llite_lib.c | 18 ++
drivers/staging/lustre/lustre/llite/statahead.c | 197 ++++++++------------
drivers/staging/lustre/lustre/llite/super25.c | 17 --
drivers/staging/lustre/lustre/obdclass/genops.c | 193 ++------------------
drivers/staging/lustre/lustre/obdclass/lu_object.c | 7 -
drivers/staging/lustre/lustre/obdclass/obd_mount.c | 2
drivers/staging/lustre/lustre/osc/osc_cache.c | 2
drivers/staging/lustre/lustre/ptlrpc/pinger.c | 81 ++------
.../staging/lustre/lustre/ptlrpc/ptlrpc_module.c | 56 ++++--
drivers/staging/lustre/lustre/ptlrpc/sec.c | 6 -
drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c | 2
drivers/staging/lustre/lustre/ptlrpc/sec_gc.c | 90 +++------
drivers/staging/lustre/lustre/ptlrpc/service.c | 4
24 files changed, 259 insertions(+), 568 deletions(-)

--
Signature



2018-03-01 23:33:52

by NeilBrown

[permalink] [raw]
Subject: [PATCH 03/17] staging: lustre: statahead: remove incorrect test on agl_list_empty()

Including agl_list_empty() in the wait_event_idle() condition
is pointless as the body of the loop doesn't do anything
about the agl list.
So if the list wasn't empty, the while loop would spin
indefinitely.

The test was removed in the lustre-release commit
672ab0e00d61 ("LU-3270 statahead: small fixes and cleanup"),
but not in the Linux commit 5231f7651c55 ("staging: lustre:
statahead: small fixes and cleanup").

Fixes: 5231f7651c55 ("staging: lustre: statahead: small fixes and cleanup")
Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/llite/statahead.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 6052bfd7ff05..ba00881a5745 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1124,7 +1124,6 @@ static int ll_statahead_thread(void *arg)
while (thread_is_running(sa_thread)) {
wait_event_idle(sa_thread->t_ctl_waitq,
sa_has_callback(sai) ||
- !agl_list_empty(sai) ||
!thread_is_running(sa_thread));

sa_handle_callback(sai);



2018-03-01 23:34:52

by NeilBrown

[permalink] [raw]
Subject: [PATCH 02/17] staging: lustre: fix bug in osc_enter_cache_try

The lustre-release patch commit bdc5bb52c554 ("LU-4933 osc:
Automatically increase the max_dirty_mb") changed

- if (cli->cl_dirty + PAGE_CACHE_SIZE <= cli->cl_dirty_max &&
+ if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&

When this patch landed in Linux a couple of years later, it landed as

- if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
+ if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&

which is clearly different ('<=' vs '<'), and allows cl_dirty_pages to
increase beyond cl_dirty_max_pages - which causes a latter assertion
to fails.

Fixes: 3147b268400a ("staging: lustre: osc: Automatically increase the max_dirty_mb")
Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/include/obd.h | 2 +-
drivers/staging/lustre/lustre/osc/osc_cache.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 4368f4e9f208..f1233ca7d337 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -191,7 +191,7 @@ struct client_obd {
struct sptlrpc_flavor cl_flvr_mgc; /* fixed flavor of mgc->mgs */

/* the grant values are protected by loi_list_lock below */
- unsigned long cl_dirty_pages; /* all _dirty_ in pahges */
+ unsigned long cl_dirty_pages; /* all _dirty_ in pages */
unsigned long cl_dirty_max_pages; /* allowed w/o rpc */
unsigned long cl_dirty_transit; /* dirty synchronous */
unsigned long cl_avail_grant; /* bytes of credit for ost */
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 1c70a504ee89..459503727ce3 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1529,7 +1529,7 @@ static int osc_enter_cache_try(struct client_obd *cli,
if (rc < 0)
return 0;

- if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
+ if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
atomic_long_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) {
osc_consume_write_grant(cli, &oap->oap_brw_page);
if (transient) {



2018-03-01 23:35:16

by NeilBrown

[permalink] [raw]
Subject: [PATCH 13/17] staging: lustre: remove 'ptlrpc_thread usage' for sai_agl_thread

Lustre has a 'struct ptlrpc_thread' which provides
control functionality wrapped around kthreads.
None of the functionality used in statahead.c requires
ptlrcp_thread - it can all be done directly with kthreads.

So discard the ptlrpc_thread and just use a task_struct directly.

One particular change worth noting is that in the current
code, the thread performs some start-up actions and then
signals that it is ready to go. In the new code, the thread
is first created, then the startup actions are perform, then
the thread is woken up. This means there is no need to wait
any more than kthread_create() already waits.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/lustre/llite/llite_internal.h | 2 -
drivers/staging/lustre/lustre/llite/statahead.c | 78 +++++++-------------
2 files changed, 28 insertions(+), 52 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index f68c2e88f12b..0c2d717fd526 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1071,7 +1071,7 @@ struct ll_statahead_info {
sai_in_readpage:1;/* statahead in readdir() */
wait_queue_head_t sai_waitq; /* stat-ahead wait queue */
struct ptlrpc_thread sai_thread; /* stat-ahead thread */
- struct ptlrpc_thread sai_agl_thread; /* AGL thread */
+ struct task_struct *sai_agl_task; /* AGL thread */
struct list_head sai_interim_entries; /* entries which got async
* stat reply, but not
* instantiated
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index ba00881a5745..39241b952bf4 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -383,7 +383,7 @@ static void ll_agl_add(struct ll_statahead_info *sai,
}

if (added > 0)
- wake_up(&sai->sai_agl_thread.t_ctl_waitq);
+ wake_up_process(sai->sai_agl_task);
}

/* allocate sai */
@@ -404,7 +404,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry)
sai->sai_index = 1;
init_waitqueue_head(&sai->sai_waitq);
init_waitqueue_head(&sai->sai_thread.t_ctl_waitq);
- init_waitqueue_head(&sai->sai_agl_thread.t_ctl_waitq);

INIT_LIST_HEAD(&sai->sai_interim_entries);
INIT_LIST_HEAD(&sai->sai_entries);
@@ -467,7 +466,7 @@ static void ll_sai_put(struct ll_statahead_info *sai)
spin_unlock(&lli->lli_sa_lock);

LASSERT(thread_is_stopped(&sai->sai_thread));
- LASSERT(thread_is_stopped(&sai->sai_agl_thread));
+ LASSERT(sai->sai_agl_task == NULL);
LASSERT(sai->sai_sent == sai->sai_replied);
LASSERT(!sa_has_callback(sai));

@@ -861,35 +860,13 @@ static int ll_agl_thread(void *arg)
struct inode *dir = d_inode(parent);
struct ll_inode_info *plli = ll_i2info(dir);
struct ll_inode_info *clli;
- struct ll_sb_info *sbi = ll_i2sbi(dir);
- struct ll_statahead_info *sai;
- struct ptlrpc_thread *thread;
+ /* We already own this reference, so it is safe to take it without a lock. */
+ struct ll_statahead_info *sai = plli->lli_sai;

- sai = ll_sai_get(dir);
- thread = &sai->sai_agl_thread;
- thread->t_pid = current_pid();
CDEBUG(D_READA, "agl thread started: sai %p, parent %pd\n",
sai, parent);

- atomic_inc(&sbi->ll_agl_total);
- spin_lock(&plli->lli_agl_lock);
- sai->sai_agl_valid = 1;
- if (thread_is_init(thread))
- /* If someone else has changed the thread state
- * (e.g. already changed to SVC_STOPPING), we can't just
- * blindly overwrite that setting.
- */
- thread_set_flags(thread, SVC_RUNNING);
- spin_unlock(&plli->lli_agl_lock);
- wake_up(&thread->t_ctl_waitq);
-
- while (1) {
- wait_event_idle(thread->t_ctl_waitq,
- !list_empty(&sai->sai_agls) ||
- !thread_is_running(thread));
-
- if (!thread_is_running(thread))
- break;
+ while (!kthread_should_stop()) {

spin_lock(&plli->lli_agl_lock);
/* The statahead thread maybe help to process AGL entries,
@@ -904,6 +881,12 @@ static int ll_agl_thread(void *arg)
} else {
spin_unlock(&plli->lli_agl_lock);
}
+
+ set_current_state(TASK_IDLE);
+ if (list_empty(&sai->sai_agls) &&
+ !kthread_should_stop())
+ schedule();
+ __set_current_state(TASK_RUNNING);
}

spin_lock(&plli->lli_agl_lock);
@@ -917,19 +900,16 @@ static int ll_agl_thread(void *arg)
iput(&clli->lli_vfs_inode);
spin_lock(&plli->lli_agl_lock);
}
- thread_set_flags(thread, SVC_STOPPED);
spin_unlock(&plli->lli_agl_lock);
- wake_up(&thread->t_ctl_waitq);
- ll_sai_put(sai);
CDEBUG(D_READA, "agl thread stopped: sai %p, parent %pd\n",
sai, parent);
+ ll_sai_put(sai);
return 0;
}

/* start agl thread */
static void ll_start_agl(struct dentry *parent, struct ll_statahead_info *sai)
{
- struct ptlrpc_thread *thread = &sai->sai_agl_thread;
struct ll_inode_info *plli;
struct task_struct *task;

@@ -937,16 +917,22 @@ static void ll_start_agl(struct dentry *parent, struct ll_statahead_info *sai)
sai, parent);

plli = ll_i2info(d_inode(parent));
- task = kthread_run(ll_agl_thread, parent, "ll_agl_%u",
- plli->lli_opendir_pid);
+ task = kthread_create(ll_agl_thread, parent, "ll_agl_%u",
+ plli->lli_opendir_pid);
if (IS_ERR(task)) {
CERROR("can't start ll_agl thread, rc: %ld\n", PTR_ERR(task));
- thread_set_flags(thread, SVC_STOPPED);
return;
}

- wait_event_idle(thread->t_ctl_waitq,
- thread_is_running(thread) || thread_is_stopped(thread));
+ sai->sai_agl_task = task;
+ atomic_inc(&ll_i2sbi(d_inode(parent))->ll_agl_total);
+ spin_lock(&plli->lli_agl_lock);
+ sai->sai_agl_valid = 1;
+ spin_unlock(&plli->lli_agl_lock);
+ /* Get an extra reference that the thread holds */
+ ll_sai_get(d_inode(parent));
+
+ wake_up_process(task);
}

/* statahead thread main function */
@@ -958,7 +944,6 @@ static int ll_statahead_thread(void *arg)
struct ll_sb_info *sbi = ll_i2sbi(dir);
struct ll_statahead_info *sai;
struct ptlrpc_thread *sa_thread;
- struct ptlrpc_thread *agl_thread;
struct page *page = NULL;
__u64 pos = 0;
int first = 0;
@@ -967,7 +952,6 @@ static int ll_statahead_thread(void *arg)

sai = ll_sai_get(dir);
sa_thread = &sai->sai_thread;
- agl_thread = &sai->sai_agl_thread;
sa_thread->t_pid = current_pid();
CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n",
sai, parent);
@@ -1129,21 +1113,13 @@ static int ll_statahead_thread(void *arg)
sa_handle_callback(sai);
}
out:
- if (sai->sai_agl_valid) {
- spin_lock(&lli->lli_agl_lock);
- thread_set_flags(agl_thread, SVC_STOPPING);
- spin_unlock(&lli->lli_agl_lock);
- wake_up(&agl_thread->t_ctl_waitq);
+ if (sai->sai_agl_task) {
+ kthread_stop(sai->sai_agl_task);

CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
- sai, (unsigned int)agl_thread->t_pid);
- wait_event_idle(agl_thread->t_ctl_waitq,
- thread_is_stopped(agl_thread));
- } else {
- /* Set agl_thread flags anyway. */
- thread_set_flags(agl_thread, SVC_STOPPED);
+ sai, (unsigned int)sai->sai_agl_task->pid);
+ sai->sai_agl_task = NULL;
}
-
/*
* wait for inflight statahead RPCs to finish, and then we can free sai
* safely because statahead RPC will access sai data



2018-03-01 23:35:24

by NeilBrown

[permalink] [raw]
Subject: [PATCH 15/17] staging: lustre: ptlrpc: move thread creation out of module initialization

When the ptlrpc module is loaded, it starts the pinger thread and
calls LNetNIInit which starts various threads.

We don't need these threads until the module is actually being
used, such as when a lustre filesystem is mounted.

So move the thread creation into new ptlrpc_inc_ref() (modeled on
ptlrpcd_inc_ref()), and call that when needed, such as at mount time.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/include/lustre_net.h | 3 +
drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 12 ++++
drivers/staging/lustre/lustre/llite/llite_lib.c | 18 +++++-
.../staging/lustre/lustre/ptlrpc/ptlrpc_module.c | 56 +++++++++++++-------
4 files changed, 65 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
index 108683c54127..d35ae0cda8d2 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -1804,6 +1804,9 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd);
*/
void ptlrpc_request_committed(struct ptlrpc_request *req, int force);

+int ptlrpc_inc_ref(void);
+void ptlrpc_dec_ref(void);
+
void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
struct ptlrpc_client *);
struct ptlrpc_connection *ptlrpc_uuid_to_connection(struct obd_uuid *uuid);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
index 58913e628124..c772c68e5a49 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
@@ -869,6 +869,10 @@ int ldlm_get_ref(void)
{
int rc = 0;

+ rc = ptlrpc_inc_ref();
+ if (rc)
+ return rc;
+
mutex_lock(&ldlm_ref_mutex);
if (++ldlm_refcount == 1) {
rc = ldlm_setup();
@@ -877,14 +881,18 @@ int ldlm_get_ref(void)
}
mutex_unlock(&ldlm_ref_mutex);

+ if (rc)
+ ptlrpc_dec_ref();
+
return rc;
}

void ldlm_put_ref(void)
{
+ int rc = 0;
mutex_lock(&ldlm_ref_mutex);
if (ldlm_refcount == 1) {
- int rc = ldlm_cleanup();
+ rc = ldlm_cleanup();

if (rc)
CERROR("ldlm_cleanup failed: %d\n", rc);
@@ -894,6 +902,8 @@ void ldlm_put_ref(void)
ldlm_refcount--;
}
mutex_unlock(&ldlm_ref_mutex);
+ if (!rc)
+ ptlrpc_dec_ref();
}

static ssize_t cancel_unused_locks_before_replay_show(struct kobject *kobj,
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 844182ad7dd7..706b14bf8981 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -879,9 +879,15 @@ int ll_fill_super(struct super_block *sb)

CDEBUG(D_VFSTRACE, "VFS Op: sb %p\n", sb);

+ err = ptlrpc_inc_ref();
+ if (err)
+ return err;
+
cfg = kzalloc(sizeof(*cfg), GFP_NOFS);
- if (!cfg)
- return -ENOMEM;
+ if (!cfg) {
+ err = -ENOMEM;
+ goto out_put;
+ }

try_module_get(THIS_MODULE);

@@ -891,7 +897,8 @@ int ll_fill_super(struct super_block *sb)
if (!sbi) {
module_put(THIS_MODULE);
kfree(cfg);
- return -ENOMEM;
+ err = -ENOMEM;
+ goto out_put;
}

err = ll_options(lsi->lsi_lmd->lmd_opts, &sbi->ll_flags);
@@ -958,6 +965,9 @@ int ll_fill_super(struct super_block *sb)
LCONSOLE_WARN("Mounted %s\n", profilenm);

kfree(cfg);
+out_put:
+ if (err)
+ ptlrpc_dec_ref();
return err;
} /* ll_fill_super */

@@ -1028,6 +1038,8 @@ void ll_put_super(struct super_block *sb)
cl_env_cache_purge(~0);

module_put(THIS_MODULE);
+
+ ptlrpc_dec_ref();
} /* client_put_super */

struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
index 131fc6d9646e..38923418669f 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
@@ -45,6 +45,42 @@ extern spinlock_t ptlrpc_last_xid_lock;
extern spinlock_t ptlrpc_rs_debug_lock;
#endif

+DEFINE_MUTEX(ptlrpc_startup);
+static int ptlrpc_active = 0;
+
+int ptlrpc_inc_ref(void)
+{
+ int rc = 0;
+
+ mutex_lock(&ptlrpc_startup);
+ if (ptlrpc_active++ == 0) {
+ ptlrpc_put_connection_superhack = ptlrpc_connection_put;
+
+ rc = ptlrpc_init_portals();
+ if (!rc) {
+ rc= ptlrpc_start_pinger();
+ if (rc)
+ ptlrpc_exit_portals();
+ }
+ if (rc)
+ ptlrpc_active--;
+ }
+ mutex_unlock(&ptlrpc_startup);
+ return rc;
+}
+EXPORT_SYMBOL(ptlrpc_inc_ref);
+
+void ptlrpc_dec_ref(void)
+{
+ mutex_lock(&ptlrpc_startup);
+ if (--ptlrpc_active == 0) {
+ ptlrpc_stop_pinger();
+ ptlrpc_exit_portals();
+ }
+ mutex_unlock(&ptlrpc_startup);
+}
+EXPORT_SYMBOL(ptlrpc_dec_ref);
+
static int __init ptlrpc_init(void)
{
int rc, cleanup_phase = 0;
@@ -71,24 +107,12 @@ static int __init ptlrpc_init(void)
if (rc)
goto cleanup;

- cleanup_phase = 2;
- rc = ptlrpc_init_portals();
- if (rc)
- goto cleanup;
-
cleanup_phase = 3;

rc = ptlrpc_connection_init();
if (rc)
goto cleanup;

- cleanup_phase = 4;
- ptlrpc_put_connection_superhack = ptlrpc_connection_put;
-
- rc = ptlrpc_start_pinger();
- if (rc)
- goto cleanup;
-
cleanup_phase = 5;
rc = ldlm_init();
if (rc)
@@ -122,15 +146,9 @@ static int __init ptlrpc_init(void)
ldlm_exit();
/* Fall through */
case 5:
- ptlrpc_stop_pinger();
- /* Fall through */
- case 4:
ptlrpc_connection_fini();
/* Fall through */
case 3:
- ptlrpc_exit_portals();
- /* Fall through */
- case 2:
ptlrpc_request_cache_fini();
/* Fall through */
case 1:
@@ -150,8 +168,6 @@ static void __exit ptlrpc_exit(void)
ptlrpc_nrs_fini();
sptlrpc_fini();
ldlm_exit();
- ptlrpc_stop_pinger();
- ptlrpc_exit_portals();
ptlrpc_request_cache_fini();
ptlrpc_hr_fini();
ptlrpc_connection_fini();



2018-03-01 23:35:28

by NeilBrown

[permalink] [raw]
Subject: [PATCH 06/17] staging: lustre: get entropy from nid when nid set.

When the 'lustre' module is loaded, it gets a list of
net devices and uses the node ids to add entropy
to the prng. This means that the network interfaces need
to be configured before the module is loaded, which prevents
the module from being compiled into a monolithic kernel.

So move this entropy addition to the moment when
the interface is imported to LNet and the node id is first known.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lnet/lnet/api-ni.c | 7 +++++++
drivers/staging/lustre/lustre/llite/super25.c | 17 +----------------
2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 48d25ccadbb3..90266be0132d 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1214,6 +1214,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
struct lnet_lnd *lnd;
struct lnet_tx_queue *tq;
int i;
+ u32 seed;

lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));

@@ -1352,6 +1353,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
tq->tq_credits = lnet_ni_tq_credits(ni);
}

+ /* Nodes with small feet have little entropy. The NID for this
+ * node gives the most entropy in the low bits.
+ */
+ seed = LNET_NIDADDR(ni->ni_nid);
+ add_device_randomness(&seed, sizeof(seed));
+
CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
diff --git a/drivers/staging/lustre/lustre/llite/super25.c b/drivers/staging/lustre/lustre/llite/super25.c
index 9b0bb3541a84..861e7a60f408 100644
--- a/drivers/staging/lustre/lustre/llite/super25.c
+++ b/drivers/staging/lustre/lustre/llite/super25.c
@@ -85,8 +85,7 @@ MODULE_ALIAS_FS("lustre");

static int __init lustre_init(void)
{
- struct lnet_process_id lnet_id;
- int i, rc;
+ int rc;

BUILD_BUG_ON(sizeof(LUSTRE_VOLATILE_HDR) !=
LUSTRE_VOLATILE_HDR_LEN + 1);
@@ -125,20 +124,6 @@ static int __init lustre_init(void)
goto out_debugfs;
}

- /* Nodes with small feet have little entropy. The NID for this
- * node gives the most entropy in the low bits
- */
- for (i = 0;; i++) {
- u32 seed;
-
- if (LNetGetId(i, &lnet_id) == -ENOENT)
- break;
- if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) {
- seed = LNET_NIDADDR(lnet_id.nid);
- add_device_randomness(&seed, sizeof(seed));
- }
- }
-
rc = vvp_global_init();
if (rc != 0)
goto out_sysfs;



2018-03-01 23:35:31

by NeilBrown

[permalink] [raw]
Subject: [PATCH 17/17] Revert "staging: Disable lustre file system for MIPS, SH, and XTENSA"

This reverts commit 16f1eeb660bd2bfd223704ee6350706b39c55a7a.

The reason for this patch was that lustre used copy_from_user_page.
Commit 76133e66b141 ("staging/lustre: Replace jobid acquiring with per
node setting") removed that usage.
So the arch restrictions can go.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/Kconfig | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/Kconfig b/drivers/staging/lustre/lustre/Kconfig
index c669c8fa0cc6..ccb78a945995 100644
--- a/drivers/staging/lustre/lustre/Kconfig
+++ b/drivers/staging/lustre/lustre/Kconfig
@@ -1,6 +1,5 @@
config LUSTRE_FS
tristate "Lustre file system client support"
- depends on !MIPS && !XTENSA && !SUPERH
depends on LNET
select CRYPTO
select CRYPTO_CRC32



2018-03-01 23:36:05

by NeilBrown

[permalink] [raw]
Subject: [PATCH 14/17] staging: lustre: change sai_thread to sai_task.

Rather than allocating a ptlrpc_thread for the
stat-ahead thread, just use the task_struct provided
by kthreads directly.

As nothing ever waits for the sai_task, it must call do_exit()
directly rather than simply return from the function.
Also it cannot use kthread_should_stop() to know when to stop.

There is one caller which can ask it to stop so we need a simple
signaling mechanism. I've chosen to set ->sai_task to NULL
when the thread should finish up. The thread notices this and
cleans up and exits.
lli_sa_lock is used to avoid races between waking up the process
and the process exiting.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/lustre/llite/llite_internal.h | 2
drivers/staging/lustre/lustre/llite/statahead.c | 118 +++++++++-----------
2 files changed, 54 insertions(+), 66 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 0c2d717fd526..d46bcf71b273 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1070,7 +1070,7 @@ struct ll_statahead_info {
sai_agl_valid:1,/* AGL is valid for the dir */
sai_in_readpage:1;/* statahead in readdir() */
wait_queue_head_t sai_waitq; /* stat-ahead wait queue */
- struct ptlrpc_thread sai_thread; /* stat-ahead thread */
+ struct task_struct *sai_task; /* stat-ahead thread */
struct task_struct *sai_agl_task; /* AGL thread */
struct list_head sai_interim_entries; /* entries which got async
* stat reply, but not
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 39241b952bf4..155ce3cf6f60 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -267,7 +267,7 @@ sa_kill(struct ll_statahead_info *sai, struct sa_entry *entry)

/* called by scanner after use, sa_entry will be killed */
static void
-sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
+sa_put(struct ll_statahead_info *sai, struct sa_entry *entry, struct ll_inode_info *lli)
{
struct sa_entry *tmp, *next;

@@ -295,7 +295,11 @@ sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
sa_kill(sai, tmp);
}

- wake_up(&sai->sai_thread.t_ctl_waitq);
+ spin_lock(&lli->lli_sa_lock);
+ if (sai->sai_task)
+ wake_up_process(sai->sai_task);
+ spin_unlock(&lli->lli_sa_lock);
+
}

/*
@@ -403,7 +407,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry)
sai->sai_max = LL_SA_RPC_MIN;
sai->sai_index = 1;
init_waitqueue_head(&sai->sai_waitq);
- init_waitqueue_head(&sai->sai_thread.t_ctl_waitq);

INIT_LIST_HEAD(&sai->sai_interim_entries);
INIT_LIST_HEAD(&sai->sai_entries);
@@ -465,7 +468,7 @@ static void ll_sai_put(struct ll_statahead_info *sai)
lli->lli_sai = NULL;
spin_unlock(&lli->lli_sa_lock);

- LASSERT(thread_is_stopped(&sai->sai_thread));
+ LASSERT(sai->sai_task == NULL);
LASSERT(sai->sai_agl_task == NULL);
LASSERT(sai->sai_sent == sai->sai_replied);
LASSERT(!sa_has_callback(sai));
@@ -646,7 +649,6 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
struct ll_inode_info *lli = ll_i2info(dir);
struct ll_statahead_info *sai = lli->lli_sai;
struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata;
- wait_queue_head_t *waitq = NULL;
__u64 handle = 0;

if (it_disposition(it, DISP_LOOKUP_NEG))
@@ -657,7 +659,6 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
* sai should be always valid, no need to refcount
*/
LASSERT(sai);
- LASSERT(!thread_is_stopped(&sai->sai_thread));
LASSERT(entry);

CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
@@ -681,8 +682,9 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
spin_lock(&lli->lli_sa_lock);
if (rc) {
if (__sa_make_ready(sai, entry, rc))
- waitq = &sai->sai_waitq;
+ wake_up(&sai->sai_waitq);
} else {
+ int first = 0;
entry->se_minfo = minfo;
entry->se_req = ptlrpc_request_addref(req);
/*
@@ -693,14 +695,15 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
*/
entry->se_handle = handle;
if (!sa_has_callback(sai))
- waitq = &sai->sai_thread.t_ctl_waitq;
+ first = 1;

list_add_tail(&entry->se_list, &sai->sai_interim_entries);
+
+ if (first && sai->sai_task)
+ wake_up_process(sai->sai_task);
}
sai->sai_replied++;

- if (waitq)
- wake_up(waitq);
spin_unlock(&lli->lli_sa_lock);

return rc;
@@ -942,17 +945,13 @@ static int ll_statahead_thread(void *arg)
struct inode *dir = d_inode(parent);
struct ll_inode_info *lli = ll_i2info(dir);
struct ll_sb_info *sbi = ll_i2sbi(dir);
- struct ll_statahead_info *sai;
- struct ptlrpc_thread *sa_thread;
+ struct ll_statahead_info *sai = lli->lli_sai;
struct page *page = NULL;
__u64 pos = 0;
int first = 0;
int rc = 0;
struct md_op_data *op_data;

- sai = ll_sai_get(dir);
- sa_thread = &sai->sai_thread;
- sa_thread->t_pid = current_pid();
CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n",
sai, parent);

@@ -965,21 +964,7 @@ static int ll_statahead_thread(void *arg)

op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;

- if (sbi->ll_flags & LL_SBI_AGL_ENABLED)
- ll_start_agl(parent, sai);
-
- atomic_inc(&sbi->ll_sa_total);
- spin_lock(&lli->lli_sa_lock);
- if (thread_is_init(sa_thread))
- /* If someone else has changed the thread state
- * (e.g. already changed to SVC_STOPPING), we can't just
- * blindly overwrite that setting.
- */
- thread_set_flags(sa_thread, SVC_RUNNING);
- spin_unlock(&lli->lli_sa_lock);
- wake_up(&sa_thread->t_ctl_waitq);
-
- while (pos != MDS_DIR_END_OFF && thread_is_running(sa_thread)) {
+ while (pos != MDS_DIR_END_OFF && sai->sai_task) {
struct lu_dirpage *dp;
struct lu_dirent *ent;

@@ -996,7 +981,7 @@ static int ll_statahead_thread(void *arg)

dp = page_address(page);
for (ent = lu_dirent_start(dp);
- ent && thread_is_running(sa_thread) && !sa_low_hit(sai);
+ ent && sai->sai_task && !sa_low_hit(sai);
ent = lu_dirent_next(ent)) {
struct lu_fid fid;
__u64 hash;
@@ -1046,13 +1031,7 @@ static int ll_statahead_thread(void *arg)

fid_le_to_cpu(&fid, &ent->lde_fid);

- /* wait for spare statahead window */
do {
- wait_event_idle(sa_thread->t_ctl_waitq,
- !sa_sent_full(sai) ||
- sa_has_callback(sai) ||
- !list_empty(&sai->sai_agls) ||
- !thread_is_running(sa_thread));
sa_handle_callback(sai);

spin_lock(&lli->lli_agl_lock);
@@ -1072,8 +1051,16 @@ static int ll_statahead_thread(void *arg)
spin_lock(&lli->lli_agl_lock);
}
spin_unlock(&lli->lli_agl_lock);
- } while (sa_sent_full(sai) &&
- thread_is_running(sa_thread));
+
+ set_current_state(TASK_IDLE);
+ if (sa_sent_full(sai) &&
+ !sa_has_callback(sai) &&
+ agl_list_empty(sai) &&
+ sai->sai_task)
+ /* wait for spare statahead window */
+ schedule();
+ __set_current_state(TASK_RUNNING);
+ } while (sa_sent_full(sai) && sai->sai_task);

sa_statahead(parent, name, namelen, &fid);
}
@@ -1096,7 +1083,7 @@ static int ll_statahead_thread(void *arg)

if (rc < 0) {
spin_lock(&lli->lli_sa_lock);
- thread_set_flags(sa_thread, SVC_STOPPING);
+ sai->sai_task = NULL;
lli->lli_sa_enabled = 0;
spin_unlock(&lli->lli_sa_lock);
}
@@ -1105,12 +1092,14 @@ static int ll_statahead_thread(void *arg)
* statahead is finished, but statahead entries need to be cached, wait
* for file release to stop me.
*/
- while (thread_is_running(sa_thread)) {
- wait_event_idle(sa_thread->t_ctl_waitq,
- sa_has_callback(sai) ||
- !thread_is_running(sa_thread));
-
+ while (sai->sai_task) {
sa_handle_callback(sai);
+
+ set_current_state(TASK_IDLE);
+ if (!sa_has_callback(sai) &&
+ sai->sai_task)
+ schedule();
+ __set_current_state(TASK_RUNNING);
}
out:
if (sai->sai_agl_task) {
@@ -1126,26 +1115,23 @@ static int ll_statahead_thread(void *arg)
*/
while (sai->sai_sent != sai->sai_replied) {
/* in case we're not woken up, timeout wait */
- wait_event_idle_timeout(sa_thread->t_ctl_waitq,
- sai->sai_sent == sai->sai_replied,
- HZ>>3);
+ schedule_timeout_idle(HZ>>3);
}

/* release resources held by statahead RPCs */
sa_handle_callback(sai);

- spin_lock(&lli->lli_sa_lock);
- thread_set_flags(sa_thread, SVC_STOPPED);
- spin_unlock(&lli->lli_sa_lock);
-
CDEBUG(D_READA, "statahead thread stopped: sai %p, parent %pd\n",
sai, parent);

+ spin_lock(&lli->lli_sa_lock);
+ sai->sai_task = NULL;
+ spin_unlock(&lli->lli_sa_lock);
+
wake_up(&sai->sai_waitq);
- wake_up(&sa_thread->t_ctl_waitq);
ll_sai_put(sai);

- return rc;
+ do_exit(rc);
}

/* authorize opened dir handle @key to statahead */
@@ -1187,13 +1173,13 @@ void ll_deauthorize_statahead(struct inode *dir, void *key)
lli->lli_opendir_pid = 0;
lli->lli_sa_enabled = 0;
sai = lli->lli_sai;
- if (sai && thread_is_running(&sai->sai_thread)) {
+ if (sai && sai->sai_task) {
/*
* statahead thread may not quit yet because it needs to cache
* entries, now it's time to tell it to quit.
*/
- thread_set_flags(&sai->sai_thread, SVC_STOPPING);
- wake_up(&sai->sai_thread.t_ctl_waitq);
+ wake_up_process(sai->sai_task);
+ sai->sai_task = NULL;
}
spin_unlock(&lli->lli_sa_lock);
}
@@ -1463,7 +1449,7 @@ static int revalidate_statahead_dentry(struct inode *dir,
*/
ldd = ll_d2d(*dentryp);
ldd->lld_sa_generation = lli->lli_sa_generation;
- sa_put(sai, entry);
+ sa_put(sai, entry, lli);
return rc;
}

@@ -1483,7 +1469,6 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry)
{
struct ll_inode_info *lli = ll_i2info(dir);
struct ll_statahead_info *sai = NULL;
- struct ptlrpc_thread *thread;
struct task_struct *task;
struct dentry *parent = dentry->d_parent;
int rc;
@@ -1523,18 +1508,21 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry)
CDEBUG(D_READA, "start statahead thread: [pid %d] [parent %pd]\n",
current_pid(), parent);

- task = kthread_run(ll_statahead_thread, parent, "ll_sa_%u",
- lli->lli_opendir_pid);
- thread = &sai->sai_thread;
+ task = kthread_create(ll_statahead_thread, parent, "ll_sa_%u",
+ lli->lli_opendir_pid);
if (IS_ERR(task)) {
rc = PTR_ERR(task);
CERROR("can't start ll_sa thread, rc : %d\n", rc);
goto out;
}

- wait_event_idle(thread->t_ctl_waitq,
- thread_is_running(thread) || thread_is_stopped(thread));
- ll_sai_put(sai);
+ if (ll_i2sbi(parent->d_inode)->ll_flags & LL_SBI_AGL_ENABLED)
+ ll_start_agl(parent, sai);
+
+ atomic_inc(&ll_i2sbi(parent->d_inode)->ll_sa_total);
+ sai->sai_task = task;
+
+ wake_up_process(task);

/*
* We don't stat-ahead for the first dirent since we are already in



2018-03-01 23:36:54

by NeilBrown

[permalink] [raw]
Subject: [PATCH 10/17] staging: lustre: ptlrpc: use delayed_work in sec_gc

The garbage collection for security contexts currently
has a dedicated kthread which wakes up every 30 minutes
to discard old garbage.

Replace this with a simple delayed_work item on the
system work queue.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/sec_gc.c | 90 ++++++++-----------------
1 file changed, 28 insertions(+), 62 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
index 48f1a72afd77..2c8bad7b7877 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
@@ -55,7 +55,6 @@ static spinlock_t sec_gc_list_lock;
static LIST_HEAD(sec_gc_ctx_list);
static spinlock_t sec_gc_ctx_list_lock;

-static struct ptlrpc_thread sec_gc_thread;
static atomic_t sec_gc_wait_del = ATOMIC_INIT(0);

void sptlrpc_gc_add_sec(struct ptlrpc_sec *sec)
@@ -139,86 +138,53 @@ static void sec_do_gc(struct ptlrpc_sec *sec)
sec->ps_gc_next = ktime_get_real_seconds() + sec->ps_gc_interval;
}

-static int sec_gc_main(void *arg)
-{
- struct ptlrpc_thread *thread = arg;
-
- unshare_fs_struct();
+static void sec_gc_main(struct work_struct *ws);
+static DECLARE_DELAYED_WORK(sec_gc_work, sec_gc_main);

- /* Record that the thread is running */
- thread_set_flags(thread, SVC_RUNNING);
- wake_up(&thread->t_ctl_waitq);
-
- while (1) {
- struct ptlrpc_sec *sec;
+static void sec_gc_main(struct work_struct *ws)
+{
+ struct ptlrpc_sec *sec;

- sec_process_ctx_list();
+ sec_process_ctx_list();
again:
- /* go through sec list do gc.
- * FIXME here we iterate through the whole list each time which
- * is not optimal. we perhaps want to use balanced binary tree
- * to trace each sec as order of expiry time.
- * another issue here is we wakeup as fixed interval instead of
- * according to each sec's expiry time
+ /* go through sec list do gc.
+ * FIXME here we iterate through the whole list each time which
+ * is not optimal. we perhaps want to use balanced binary tree
+ * to trace each sec as order of expiry time.
+ * another issue here is we wakeup as fixed interval instead of
+ * according to each sec's expiry time
+ */
+ mutex_lock(&sec_gc_mutex);
+ list_for_each_entry(sec, &sec_gc_list, ps_gc_list) {
+ /* if someone is waiting to be deleted, let it
+ * proceed as soon as possible.
*/
- mutex_lock(&sec_gc_mutex);
- list_for_each_entry(sec, &sec_gc_list, ps_gc_list) {
- /* if someone is waiting to be deleted, let it
- * proceed as soon as possible.
- */
- if (atomic_read(&sec_gc_wait_del)) {
- CDEBUG(D_SEC, "deletion pending, start over\n");
- mutex_unlock(&sec_gc_mutex);
- goto again;
- }
-
- sec_do_gc(sec);
+ if (atomic_read(&sec_gc_wait_del)) {
+ CDEBUG(D_SEC, "deletion pending, start over\n");
+ mutex_unlock(&sec_gc_mutex);
+ goto again;
}
- mutex_unlock(&sec_gc_mutex);
-
- /* check ctx list again before sleep */
- sec_process_ctx_list();
- wait_event_idle_timeout(thread->t_ctl_waitq,
- thread_is_stopping(thread),
- SEC_GC_INTERVAL * HZ);

- if (thread_test_and_clear_flags(thread, SVC_STOPPING))
- break;
+ sec_do_gc(sec);
}
+ mutex_unlock(&sec_gc_mutex);

- thread_set_flags(thread, SVC_STOPPED);
- wake_up(&thread->t_ctl_waitq);
- return 0;
+ /* check ctx list again before sleep */
+ sec_process_ctx_list();
+ schedule_delayed_work(&sec_gc_work, SEC_GC_INTERVAL * HZ);
}

int sptlrpc_gc_init(void)
{
- struct task_struct *task;
-
mutex_init(&sec_gc_mutex);
spin_lock_init(&sec_gc_list_lock);
spin_lock_init(&sec_gc_ctx_list_lock);

- /* initialize thread control */
- memset(&sec_gc_thread, 0, sizeof(sec_gc_thread));
- init_waitqueue_head(&sec_gc_thread.t_ctl_waitq);
-
- task = kthread_run(sec_gc_main, &sec_gc_thread, "sptlrpc_gc");
- if (IS_ERR(task)) {
- CERROR("can't start gc thread: %ld\n", PTR_ERR(task));
- return PTR_ERR(task);
- }
-
- wait_event_idle(sec_gc_thread.t_ctl_waitq,
- thread_is_running(&sec_gc_thread));
+ schedule_delayed_work(&sec_gc_work, 0);
return 0;
}

void sptlrpc_gc_fini(void)
{
- thread_set_flags(&sec_gc_thread, SVC_STOPPING);
- wake_up(&sec_gc_thread.t_ctl_waitq);
-
- wait_event_idle(sec_gc_thread.t_ctl_waitq,
- thread_is_stopped(&sec_gc_thread));
+ cancel_delayed_work_sync(&sec_gc_work);
}



2018-03-01 23:38:22

by NeilBrown

[permalink] [raw]
Subject: [PATCH 04/17] staging: lustre: obdclass: don't require lct_owner to be non-NULL.

Some places in lu_object.c allow lct_owner to be NULL, implying
that the code is built in to the kernel (not a module), but
two places don't. This prevents us from building lustre into
the kernel.

So remove the requirement and always allow lct_owner to be NULL.

This requires removing an "assert" that the module count is positive,
but this is redundant as module_put() already does the necessary test.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/lu_object.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index cca688175d2d..880800e78c52 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -1380,12 +1380,8 @@ static void key_fini(struct lu_context *ctx, int index)
lu_ref_del(&key->lct_reference, "ctx", ctx);
atomic_dec(&key->lct_used);

- if ((ctx->lc_tags & LCT_NOREF) == 0) {
-#ifdef CONFIG_MODULE_UNLOAD
- LINVRNT(module_refcount(key->lct_owner) > 0);
-#endif
+ if ((ctx->lc_tags & LCT_NOREF) == 0)
module_put(key->lct_owner);
- }
ctx->lc_value[index] = NULL;
}
}
@@ -1619,7 +1615,6 @@ static int keys_fill(struct lu_context *ctx)
LINVRNT(key->lct_init);
LINVRNT(key->lct_index == i);

- LASSERT(key->lct_owner);
if (!(ctx->lc_tags & LCT_NOREF) &&
!try_module_get(key->lct_owner)) {
/* module is unloading, skip this key */



2018-03-02 00:11:02

by NeilBrown

[permalink] [raw]
Subject: [PATCH 01/17] staging: lustre: obd_mount: use correct niduuid suffix.

Commit 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf") moved
some sprintf() calls earlier in the code to combine them with
memory allocation and create kasprintf() calls.

In one case, this code movement moved the sprintf to a location where the
values being formatter were different.
In particular
sprintf(niduuid, "%s_%x", mgcname, i);
was move from *after* the line
i = 0;
to a location where the value of 'i' was at least 1.

This cause the wrong name to be formatted, and triggers

CERROR("del MDC UUID %s failed: rc = %d\n",
niduuid, rc);

at unmount time.

So use '0' instead of 'i'.

Fixes: 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf")
Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/obdclass/obd_mount.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index acc1ea773c9c..f5e8214ac37b 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -243,7 +243,7 @@ int lustre_start_mgc(struct super_block *sb)
libcfs_nid2str_r(nid, nidstr, sizeof(nidstr));
mgcname = kasprintf(GFP_NOFS,
"%s%s", LUSTRE_MGC_OBDNAME, nidstr);
- niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, i);
+ niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, 0);
if (!mgcname || !niduuid) {
rc = -ENOMEM;
goto out_free;



2018-03-02 00:11:32

by NeilBrown

[permalink] [raw]
Subject: [PATCH 09/17] staging: lustre: ldlm: use delayed_work for pools_recalc

ldlm currenty has a kthread which wakes up every so often
and calls ldlm_pools_recalc().
The thread is started and stopped, but no other external interactions
happen.

This can trivially be replaced by a delayed_work if we have
ldlm_pools_recalc() reschedule the work rather than just report
when to do that.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_pool.c | 99 +++---------------------
1 file changed, 11 insertions(+), 88 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
index a0e486b57e08..53b8f33e54b5 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
@@ -784,9 +784,6 @@ static int ldlm_pool_granted(struct ldlm_pool *pl)
return atomic_read(&pl->pl_granted);
}

-static struct ptlrpc_thread *ldlm_pools_thread;
-static struct completion ldlm_pools_comp;
-
/*
* count locks from all namespaces (if possible). Returns number of
* cached locks.
@@ -899,8 +896,12 @@ static unsigned long ldlm_pools_cli_scan(struct shrinker *s,
sc->gfp_mask);
}

-static int ldlm_pools_recalc(enum ldlm_side client)
+static void ldlm_pools_recalc(struct work_struct *ws);
+static DECLARE_DELAYED_WORK(ldlm_recalc_pools, ldlm_pools_recalc);
+
+static void ldlm_pools_recalc(struct work_struct *ws)
{
+ enum ldlm_side client = LDLM_NAMESPACE_CLIENT;
struct ldlm_namespace *ns;
struct ldlm_namespace *ns_old = NULL;
/* seconds of sleep if no active namespaces */
@@ -982,92 +983,19 @@ static int ldlm_pools_recalc(enum ldlm_side client)
/* Wake up the blocking threads from time to time. */
ldlm_bl_thread_wakeup();

- return time;
-}
-
-static int ldlm_pools_thread_main(void *arg)
-{
- struct ptlrpc_thread *thread = (struct ptlrpc_thread *)arg;
- int c_time;
-
- thread_set_flags(thread, SVC_RUNNING);
- wake_up(&thread->t_ctl_waitq);
-
- CDEBUG(D_DLMTRACE, "%s: pool thread starting, process %d\n",
- "ldlm_poold", current_pid());
-
- while (1) {
- /*
- * Recal all pools on this tick.
- */
- c_time = ldlm_pools_recalc(LDLM_NAMESPACE_CLIENT);
-
- /*
- * Wait until the next check time, or until we're
- * stopped.
- */
- wait_event_idle_timeout(thread->t_ctl_waitq,
- thread_is_stopping(thread) ||
- thread_is_event(thread),
- c_time * HZ);
-
- if (thread_test_and_clear_flags(thread, SVC_STOPPING))
- break;
- thread_test_and_clear_flags(thread, SVC_EVENT);
- }
-
- thread_set_flags(thread, SVC_STOPPED);
- wake_up(&thread->t_ctl_waitq);
-
- CDEBUG(D_DLMTRACE, "%s: pool thread exiting, process %d\n",
- "ldlm_poold", current_pid());
-
- complete_and_exit(&ldlm_pools_comp, 0);
+ schedule_delayed_work(&ldlm_recalc_pools, time * HZ);
}

static int ldlm_pools_thread_start(void)
{
- struct task_struct *task;
-
- if (ldlm_pools_thread)
- return -EALREADY;
-
- ldlm_pools_thread = kzalloc(sizeof(*ldlm_pools_thread), GFP_NOFS);
- if (!ldlm_pools_thread)
- return -ENOMEM;
-
- init_completion(&ldlm_pools_comp);
- init_waitqueue_head(&ldlm_pools_thread->t_ctl_waitq);
+ schedule_delayed_work(&ldlm_recalc_pools, 0);

- task = kthread_run(ldlm_pools_thread_main, ldlm_pools_thread,
- "ldlm_poold");
- if (IS_ERR(task)) {
- CERROR("Can't start pool thread, error %ld\n", PTR_ERR(task));
- kfree(ldlm_pools_thread);
- ldlm_pools_thread = NULL;
- return PTR_ERR(task);
- }
- wait_event_idle(ldlm_pools_thread->t_ctl_waitq,
- thread_is_running(ldlm_pools_thread));
return 0;
}

static void ldlm_pools_thread_stop(void)
{
- if (!ldlm_pools_thread)
- return;
-
- thread_set_flags(ldlm_pools_thread, SVC_STOPPING);
- wake_up(&ldlm_pools_thread->t_ctl_waitq);
-
- /*
- * Make sure that pools thread is finished before freeing @thread.
- * This fixes possible race and oops due to accessing freed memory
- * in pools thread.
- */
- wait_for_completion(&ldlm_pools_comp);
- kfree(ldlm_pools_thread);
- ldlm_pools_thread = NULL;
+ cancel_delayed_work_sync(&ldlm_recalc_pools);
}

static struct shrinker ldlm_pools_cli_shrinker = {
@@ -1081,20 +1009,15 @@ int ldlm_pools_init(void)
int rc;

rc = ldlm_pools_thread_start();
- if (rc)
- return rc;
-
- rc = register_shrinker(&ldlm_pools_cli_shrinker);
- if (rc)
- ldlm_pools_thread_stop();
+ if (!rc)
+ rc = register_shrinker(&ldlm_pools_cli_shrinker);

return rc;
}

void ldlm_pools_fini(void)
{
- if (ldlm_pools_thread)
- unregister_shrinker(&ldlm_pools_cli_shrinker);
+ unregister_shrinker(&ldlm_pools_cli_shrinker);

ldlm_pools_thread_stop();
}



2018-03-02 00:11:34

by NeilBrown

[permalink] [raw]
Subject: [PATCH 08/17] staging: lustre: obdclass: use workqueue for zombie management.

obdclass currently maintains two lists of data structures
(imports and exports), and a kthread which will free
anything on either list. The thread is woken whenever
anything is added to either list.

This is exactly the sort of thing that workqueues exist for.

So discard the zombie kthread and the lists and locks, and
create a single workqueue. Each obd_import and obd_export
gets a work_struct to attach to this workqueue.

This requires a small change to import_sec_validate_get()
which was testing if an obd_import was on the zombie
list. This cannot have every safely found it to be
on the list (as it could be freed asynchronously)
so it must be dead code.

We could use system_wq instead of creating a dedicated
zombie_wq, but as we occasionally want to flush all pending
work, it is a little nicer to only have to wait for our own
work items.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/lustre/include/lustre_export.h | 2
.../staging/lustre/lustre/include/lustre_import.h | 4
drivers/staging/lustre/lustre/obdclass/genops.c | 193 ++------------------
drivers/staging/lustre/lustre/ptlrpc/sec.c | 6 -
4 files changed, 30 insertions(+), 175 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h b/drivers/staging/lustre/lustre/include/lustre_export.h
index 66ac9dc7302a..40cd168ed2ea 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -87,6 +87,8 @@ struct obd_export {
struct obd_uuid exp_client_uuid;
/** To link all exports on an obd device */
struct list_head exp_obd_chain;
+ /** work_struct for destruction of export */
+ struct work_struct exp_zombie_work;
struct hlist_node exp_uuid_hash; /** uuid-export hash*/
/** Obd device of this export */
struct obd_device *exp_obd;
diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
index ea158e0630e2..1731048f1ff2 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -162,8 +162,8 @@ struct obd_import {
struct ptlrpc_client *imp_client;
/** List element for linking into pinger chain */
struct list_head imp_pinger_chain;
- /** List element for linking into chain for destruction */
- struct list_head imp_zombie_chain;
+ /** work struct for destruction of import */
+ struct work_struct imp_zombie_work;

/**
* Lists of requests that are retained for replay, waiting for a reply,
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index 8f776a4058a9..63ccbabb4c5a 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -48,10 +48,7 @@ struct kmem_cache *obdo_cachep;
EXPORT_SYMBOL(obdo_cachep);
static struct kmem_cache *import_cachep;

-static struct list_head obd_zombie_imports;
-static struct list_head obd_zombie_exports;
-static spinlock_t obd_zombie_impexp_lock;
-static void obd_zombie_impexp_notify(void);
+static struct workqueue_struct *zombie_wq;
static void obd_zombie_export_add(struct obd_export *exp);
static void obd_zombie_import_add(struct obd_import *imp);

@@ -701,6 +698,13 @@ void class_export_put(struct obd_export *exp)
}
EXPORT_SYMBOL(class_export_put);

+static void obd_zombie_exp_cull(struct work_struct *ws)
+{
+ struct obd_export *export = container_of(ws, struct obd_export, exp_zombie_work);
+
+ class_export_destroy(export);
+}
+
/* Creates a new export, adds it to the hash table, and returns a
* pointer to it. The refcount is 2: one for the hash reference, and
* one for the pointer returned by this function.
@@ -741,6 +745,7 @@ struct obd_export *class_new_export(struct obd_device *obd,
INIT_HLIST_NODE(&export->exp_uuid_hash);
spin_lock_init(&export->exp_bl_list_lock);
INIT_LIST_HEAD(&export->exp_bl_list);
+ INIT_WORK(&export->exp_zombie_work, obd_zombie_exp_cull);

export->exp_sp_peer = LUSTRE_SP_ANY;
export->exp_flvr.sf_rpc = SPTLRPC_FLVR_INVALID;
@@ -862,7 +867,6 @@ EXPORT_SYMBOL(class_import_get);

void class_import_put(struct obd_import *imp)
{
- LASSERT(list_empty(&imp->imp_zombie_chain));
LASSERT_ATOMIC_GT_LT(&imp->imp_refcount, 0, LI_POISON);

CDEBUG(D_INFO, "import %p refcount=%d obd=%s\n", imp,
@@ -894,6 +898,13 @@ static void init_imp_at(struct imp_at *at)
}
}

+static void obd_zombie_imp_cull(struct work_struct *ws)
+{
+ struct obd_import *import = container_of(ws, struct obd_import, imp_zombie_work);
+
+ class_import_destroy(import);
+}
+
struct obd_import *class_new_import(struct obd_device *obd)
{
struct obd_import *imp;
@@ -903,7 +914,6 @@ struct obd_import *class_new_import(struct obd_device *obd)
return NULL;

INIT_LIST_HEAD(&imp->imp_pinger_chain);
- INIT_LIST_HEAD(&imp->imp_zombie_chain);
INIT_LIST_HEAD(&imp->imp_replay_list);
INIT_LIST_HEAD(&imp->imp_sending_list);
INIT_LIST_HEAD(&imp->imp_delayed_list);
@@ -917,6 +927,7 @@ struct obd_import *class_new_import(struct obd_device *obd)
imp->imp_obd = class_incref(obd, "import", imp);
mutex_init(&imp->imp_sec_mutex);
init_waitqueue_head(&imp->imp_recovery_waitq);
+ INIT_WORK(&imp->imp_zombie_work, obd_zombie_imp_cull);

atomic_set(&imp->imp_refcount, 2);
atomic_set(&imp->imp_unregistering, 0);
@@ -1098,81 +1109,6 @@ EXPORT_SYMBOL(class_fail_export);
void (*class_export_dump_hook)(struct obd_export *) = NULL;
#endif

-/* Total amount of zombies to be destroyed */
-static int zombies_count;
-
-/**
- * kill zombie imports and exports
- */
-static void obd_zombie_impexp_cull(void)
-{
- struct obd_import *import;
- struct obd_export *export;
-
- do {
- spin_lock(&obd_zombie_impexp_lock);
-
- import = NULL;
- if (!list_empty(&obd_zombie_imports)) {
- import = list_entry(obd_zombie_imports.next,
- struct obd_import,
- imp_zombie_chain);
- list_del_init(&import->imp_zombie_chain);
- }
-
- export = NULL;
- if (!list_empty(&obd_zombie_exports)) {
- export = list_entry(obd_zombie_exports.next,
- struct obd_export,
- exp_obd_chain);
- list_del_init(&export->exp_obd_chain);
- }
-
- spin_unlock(&obd_zombie_impexp_lock);
-
- if (import) {
- class_import_destroy(import);
- spin_lock(&obd_zombie_impexp_lock);
- zombies_count--;
- spin_unlock(&obd_zombie_impexp_lock);
- }
-
- if (export) {
- class_export_destroy(export);
- spin_lock(&obd_zombie_impexp_lock);
- zombies_count--;
- spin_unlock(&obd_zombie_impexp_lock);
- }
-
- cond_resched();
- } while (import || export);
-}
-
-static struct completion obd_zombie_start;
-static struct completion obd_zombie_stop;
-static unsigned long obd_zombie_flags;
-static wait_queue_head_t obd_zombie_waitq;
-static pid_t obd_zombie_pid;
-
-enum {
- OBD_ZOMBIE_STOP = 0x0001,
-};
-
-/**
- * check for work for kill zombie import/export thread.
- */
-static int obd_zombie_impexp_check(void *arg)
-{
- int rc;
-
- spin_lock(&obd_zombie_impexp_lock);
- rc = (zombies_count == 0) &&
- !test_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags);
- spin_unlock(&obd_zombie_impexp_lock);
-
- return rc;
-}
-
/**
* Add export to the obd_zombie thread and notify it.
*/
@@ -1182,12 +1118,7 @@ static void obd_zombie_export_add(struct obd_export *exp)
LASSERT(!list_empty(&exp->exp_obd_chain));
list_del_init(&exp->exp_obd_chain);
spin_unlock(&exp->exp_obd->obd_dev_lock);
- spin_lock(&obd_zombie_impexp_lock);
- zombies_count++;
- list_add(&exp->exp_obd_chain, &obd_zombie_exports);
- spin_unlock(&obd_zombie_impexp_lock);
-
- obd_zombie_impexp_notify();
+ queue_work(zombie_wq, &exp->exp_zombie_work);
}

/**
@@ -1196,40 +1127,7 @@ static void obd_zombie_export_add(struct obd_export *exp)
static void obd_zombie_import_add(struct obd_import *imp)
{
LASSERT(!imp->imp_sec);
- spin_lock(&obd_zombie_impexp_lock);
- LASSERT(list_empty(&imp->imp_zombie_chain));
- zombies_count++;
- list_add(&imp->imp_zombie_chain, &obd_zombie_imports);
- spin_unlock(&obd_zombie_impexp_lock);
-
- obd_zombie_impexp_notify();
-}
-
-/**
- * notify import/export destroy thread about new zombie.
- */
-static void obd_zombie_impexp_notify(void)
-{
- /*
- * Make sure obd_zombie_impexp_thread get this notification.
- * It is possible this signal only get by obd_zombie_barrier, and
- * barrier gulps this notification and sleeps away and hangs ensues
- */
- wake_up_all(&obd_zombie_waitq);
-}
-
-/**
- * check whether obd_zombie is idle
- */
-static int obd_zombie_is_idle(void)
-{
- int rc;
-
- LASSERT(!test_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags));
- spin_lock(&obd_zombie_impexp_lock);
- rc = (zombies_count == 0);
- spin_unlock(&obd_zombie_impexp_lock);
- return rc;
+ queue_work(zombie_wq, &imp->imp_zombie_work);
}

/**
@@ -1237,60 +1135,19 @@ static int obd_zombie_is_idle(void)
*/
void obd_zombie_barrier(void)
{
- if (obd_zombie_pid == current_pid())
- /* don't wait for myself */
- return;
- wait_event_idle(obd_zombie_waitq, obd_zombie_is_idle());
+ flush_workqueue(zombie_wq);
}
EXPORT_SYMBOL(obd_zombie_barrier);

-/**
- * destroy zombie export/import thread.
- */
-static int obd_zombie_impexp_thread(void *unused)
-{
- unshare_fs_struct();
- complete(&obd_zombie_start);
-
- obd_zombie_pid = current_pid();
-
- while (!test_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags)) {
- wait_event_idle(obd_zombie_waitq,
- !obd_zombie_impexp_check(NULL));
- obd_zombie_impexp_cull();
-
- /*
- * Notify obd_zombie_barrier callers that queues
- * may be empty.
- */
- wake_up(&obd_zombie_waitq);
- }
-
- complete(&obd_zombie_stop);
-
- return 0;
-}
-
/**
* start destroy zombie import/export thread
*/
int obd_zombie_impexp_init(void)
{
- struct task_struct *task;
-
- INIT_LIST_HEAD(&obd_zombie_imports);
- INIT_LIST_HEAD(&obd_zombie_exports);
- spin_lock_init(&obd_zombie_impexp_lock);
- init_completion(&obd_zombie_start);
- init_completion(&obd_zombie_stop);
- init_waitqueue_head(&obd_zombie_waitq);
- obd_zombie_pid = 0;
-
- task = kthread_run(obd_zombie_impexp_thread, NULL, "obd_zombid");
- if (IS_ERR(task))
- return PTR_ERR(task);
+ zombie_wq = alloc_workqueue("obd_zombid", 0, 0);
+ if (!zombie_wq)
+ return -ENOMEM;

- wait_for_completion(&obd_zombie_start);
return 0;
}

@@ -1299,9 +1156,7 @@ int obd_zombie_impexp_init(void)
*/
void obd_zombie_impexp_stop(void)
{
- set_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags);
- obd_zombie_impexp_notify();
- wait_for_completion(&obd_zombie_stop);
+ destroy_workqueue(zombie_wq);
}

struct obd_request_slot_waiter {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec.c b/drivers/staging/lustre/lustre/ptlrpc/sec.c
index f152ba1af0fc..3cb1e075f077 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec.c
@@ -339,11 +339,9 @@ static int import_sec_validate_get(struct obd_import *imp,
}

*sec = sptlrpc_import_sec_ref(imp);
- /* Only output an error when the import is still active */
if (!*sec) {
- if (list_empty(&imp->imp_zombie_chain))
- CERROR("import %p (%s) with no sec\n",
- imp, ptlrpc_import_state_name(imp->imp_state));
+ CERROR("import %p (%s) with no sec\n",
+ imp, ptlrpc_import_state_name(imp->imp_state));
return -EACCES;
}




2018-03-02 00:12:11

by NeilBrown

[permalink] [raw]
Subject: [PATCH 05/17] staging: lustre: lnet: keep ln_nportals consistent

ln_nportals should be zero when no portals have
been allocated. This ensures that memory allocation failure
is handled correctly elsewhere.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lnet/lnet/lib-ptl.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 471f2f6c86f4..fc47379c5938 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -841,6 +841,7 @@ lnet_portals_destroy(void)

cfs_array_free(the_lnet.ln_portals);
the_lnet.ln_portals = NULL;
+ the_lnet.ln_nportals = 0;
}

int
@@ -851,12 +852,12 @@ lnet_portals_create(void)

size = offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]);

- the_lnet.ln_nportals = MAX_PORTALS;
- the_lnet.ln_portals = cfs_array_alloc(the_lnet.ln_nportals, size);
+ the_lnet.ln_portals = cfs_array_alloc(MAX_PORTALS, size);
if (!the_lnet.ln_portals) {
CERROR("Failed to allocate portals table\n");
return -ENOMEM;
}
+ the_lnet.ln_nportals = MAX_PORTALS;

for (i = 0; i < the_lnet.ln_nportals; i++) {
if (lnet_ptl_setup(the_lnet.ln_portals[i], i)) {



2018-03-02 01:01:34

by NeilBrown

[permalink] [raw]
Subject: [PATCH 07/17] staging: lustre: ptlrpc: change GFP_NOFS to GFP_KERNEL

These allocations are performed during initialization,
so they don't need GFP_NOFS.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c | 2 +-
drivers/staging/lustre/lustre/ptlrpc/service.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
index 577c5822b823..625b9520d78f 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
@@ -377,7 +377,7 @@ static inline void enc_pools_alloc(void)
page_pools.epp_pools =
kvzalloc(page_pools.epp_max_pools *
sizeof(*page_pools.epp_pools),
- GFP_NOFS);
+ GFP_KERNEL);
}

static inline void enc_pools_free(void)
diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 49417228b621..f37364e00dfe 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -2046,7 +2046,7 @@ static int ptlrpc_main(void *arg)
goto out;
}

- env = kzalloc(sizeof(*env), GFP_NOFS);
+ env = kzalloc(sizeof(*env), GFP_KERNEL);
if (!env) {
rc = -ENOMEM;
goto out_srv_fini;
@@ -2072,7 +2072,7 @@ static int ptlrpc_main(void *arg)
}

/* Alloc reply state structure for this one */
- rs = kvzalloc(svc->srv_max_reply_size, GFP_NOFS);
+ rs = kvzalloc(svc->srv_max_reply_size, GFP_KERNEL);
if (!rs) {
rc = -ENOMEM;
goto out_srv_fini;



2018-03-02 01:02:12

by NeilBrown

[permalink] [raw]
Subject: [PATCH 11/17] staging: lustre: ptlrpc: use workqueue for pinger

lustre has a "Pinger" kthread which periodically pings peers
to ensure all hosts are functioning.

This can more easily be done using a work queue.

As maintaining contact with other peers is import for
keeping the filesystem running, and as the filesystem might
be involved in freeing memory, it is safest to have a
separate WQ_MEM_RECLAIM workqueue.

The SVC_EVENT functionality to wake up the thread can be
replaced with mod_delayed_work().

Also use round_jiffies_up_relative() rather than setting a
minimum of 1 second delay. The PING_INTERVAL is measured in
seconds so this meets the need is allow the workqueue to
keep wakeups synchronized.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/pinger.c | 81 +++++++------------------
1 file changed, 24 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
index b5f3cfee8e75..0775b7a048bb 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
@@ -217,21 +217,18 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
}
}

-static int ptlrpc_pinger_main(void *arg)
-{
- struct ptlrpc_thread *thread = arg;
-
- /* Record that the thread is running */
- thread_set_flags(thread, SVC_RUNNING);
- wake_up(&thread->t_ctl_waitq);
+static struct workqueue_struct *pinger_wq;
+static void ptlrpc_pinger_main(struct work_struct *ws);
+static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);

- /* And now, loop forever, pinging as needed. */
- while (1) {
- unsigned long this_ping = cfs_time_current();
- long time_to_next_wake;
- struct timeout_item *item;
- struct obd_import *imp;
+static void ptlrpc_pinger_main(struct work_struct *ws)
+{
+ unsigned long this_ping = cfs_time_current();
+ long time_to_next_wake;
+ struct timeout_item *item;
+ struct obd_import *imp;

+ do {
mutex_lock(&pinger_mutex);
list_for_each_entry(item, &timeout_list, ti_chain) {
item->ti_cb(item, item->ti_cb_data);
@@ -260,50 +257,24 @@ static int ptlrpc_pinger_main(void *arg)
time_to_next_wake,
cfs_time_add(this_ping,
PING_INTERVAL * HZ));
- if (time_to_next_wake > 0) {
- wait_event_idle_timeout(thread->t_ctl_waitq,
- thread_is_stopping(thread) ||
- thread_is_event(thread),
- max_t(long, time_to_next_wake, HZ));
- if (thread_test_and_clear_flags(thread, SVC_STOPPING))
- break;
- /* woken after adding import to reset timer */
- thread_test_and_clear_flags(thread, SVC_EVENT);
- }
- }
+ } while (time_to_next_wake <= 0);

- thread_set_flags(thread, SVC_STOPPED);
- wake_up(&thread->t_ctl_waitq);
-
- CDEBUG(D_NET, "pinger thread exiting, process %d\n", current_pid());
- return 0;
+ queue_delayed_work(pinger_wq, &ping_work,
+ round_jiffies_up_relative(time_to_next_wake));
}

-static struct ptlrpc_thread pinger_thread;
-
int ptlrpc_start_pinger(void)
{
- struct task_struct *task;
- int rc;
-
- if (!thread_is_init(&pinger_thread) &&
- !thread_is_stopped(&pinger_thread))
+ if (pinger_wq)
return -EALREADY;

- init_waitqueue_head(&pinger_thread.t_ctl_waitq);
-
- strcpy(pinger_thread.t_name, "ll_ping");
-
- task = kthread_run(ptlrpc_pinger_main, &pinger_thread,
- pinger_thread.t_name);
- if (IS_ERR(task)) {
- rc = PTR_ERR(task);
- CERROR("cannot start pinger thread: rc = %d\n", rc);
- return rc;
+ pinger_wq = alloc_workqueue("ptlrpc_pinger", WQ_MEM_RECLAIM, 1);
+ if (!pinger_wq) {
+ CERROR("cannot start pinger workqueue\n");
+ return -ENOMEM;
}
- wait_event_idle(pinger_thread.t_ctl_waitq,
- thread_is_running(&pinger_thread));

+ queue_delayed_work(pinger_wq, &ping_work, 0);
return 0;
}

@@ -313,16 +284,13 @@ int ptlrpc_stop_pinger(void)
{
int rc = 0;

- if (thread_is_init(&pinger_thread) ||
- thread_is_stopped(&pinger_thread))
+ if (!pinger_wq)
return -EALREADY;

ptlrpc_pinger_remove_timeouts();
- thread_set_flags(&pinger_thread, SVC_STOPPING);
- wake_up(&pinger_thread.t_ctl_waitq);
-
- wait_event_idle(pinger_thread.t_ctl_waitq,
- thread_is_stopped(&pinger_thread));
+ cancel_delayed_work_sync(&ping_work);
+ destroy_workqueue(pinger_wq);
+ pinger_wq = NULL;

return rc;
}
@@ -505,6 +473,5 @@ static int ptlrpc_pinger_remove_timeouts(void)

void ptlrpc_pinger_wake_up(void)
{
- thread_add_flags(&pinger_thread, SVC_EVENT);
- wake_up(&pinger_thread.t_ctl_waitq);
+ mod_delayed_work(pinger_wq, &ping_work, 0);
}



2018-03-02 01:02:48

by NeilBrown

[permalink] [raw]
Subject: [PATCH 12/17] staging: lustre: remove unused flag from ptlrpc_thread

SVC_EVENT is no longer used.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lustre/include/lustre_net.h | 11 -----------
1 file changed, 11 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
index 5a4434e7c85a..108683c54127 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -1259,7 +1259,6 @@ enum {
SVC_STOPPING = 1 << 1,
SVC_STARTING = 1 << 2,
SVC_RUNNING = 1 << 3,
- SVC_EVENT = 1 << 4,
};

#define PTLRPC_THR_NAME_LEN 32
@@ -1302,11 +1301,6 @@ struct ptlrpc_thread {
char t_name[PTLRPC_THR_NAME_LEN];
};

-static inline int thread_is_init(struct ptlrpc_thread *thread)
-{
- return thread->t_flags == 0;
-}
-
static inline int thread_is_stopped(struct ptlrpc_thread *thread)
{
return !!(thread->t_flags & SVC_STOPPED);
@@ -1327,11 +1321,6 @@ static inline int thread_is_running(struct ptlrpc_thread *thread)
return !!(thread->t_flags & SVC_RUNNING);
}

-static inline int thread_is_event(struct ptlrpc_thread *thread)
-{
- return !!(thread->t_flags & SVC_EVENT);
-}
-
static inline void thread_clear_flags(struct ptlrpc_thread *thread, __u32 flags)
{
thread->t_flags &= ~flags;



2018-03-02 01:16:08

by NeilBrown

[permalink] [raw]
Subject: [PATCH 16/17] staging: lustre: allow monolithic builds

Remove restriction the lustre must be built
as modules. It now works as a monolithic build.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/staging/lustre/lnet/Kconfig | 2 +-
drivers/staging/lustre/lustre/Kconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/Kconfig b/drivers/staging/lustre/lnet/Kconfig
index 6bcb53d0c6f4..ad049e6f24e4 100644
--- a/drivers/staging/lustre/lnet/Kconfig
+++ b/drivers/staging/lustre/lnet/Kconfig
@@ -1,6 +1,6 @@
config LNET
tristate "Lustre networking subsystem (LNet)"
- depends on INET && m
+ depends on INET
help
The Lustre network layer, also known as LNet, is a networking abstaction
level API that was initially created to allow Lustre Filesystem to utilize
diff --git a/drivers/staging/lustre/lustre/Kconfig b/drivers/staging/lustre/lustre/Kconfig
index 90d826946c6a..c669c8fa0cc6 100644
--- a/drivers/staging/lustre/lustre/Kconfig
+++ b/drivers/staging/lustre/lustre/Kconfig
@@ -1,6 +1,6 @@
config LUSTRE_FS
tristate "Lustre file system client support"
- depends on m && !MIPS && !XTENSA && !SUPERH
+ depends on !MIPS && !XTENSA && !SUPERH
depends on LNET
select CRYPTO
select CRYPTO_CRC32



2018-03-07 20:50:29

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 01/17] staging: lustre: obd_mount: use correct niduuid suffix.

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> Commit 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf") moved
> some sprintf() calls earlier in the code to combine them with
> memory allocation and create kasprintf() calls.
>
> In one case, this code movement moved the sprintf to a location where the
> values being formatter were different.
> In particular
> sprintf(niduuid, "%s_%x", mgcname, i);
> was move from *after* the line
> i = 0;
> to a location where the value of 'i' was at least 1.
>
> This cause the wrong name to be formatted, and triggers
>
> CERROR("del MDC UUID %s failed: rc = %d\n",
> niduuid, rc);
>
> at unmount time.
>
> So use '0' instead of 'i'.
>
> Fixes: 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf")
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/obdclass/obd_mount.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> index acc1ea773c9c..f5e8214ac37b 100644
> --- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> +++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> @@ -243,7 +243,7 @@ int lustre_start_mgc(struct super_block *sb)
> libcfs_nid2str_r(nid, nidstr, sizeof(nidstr));
> mgcname = kasprintf(GFP_NOFS,
> "%s%s", LUSTRE_MGC_OBDNAME, nidstr);
> - niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, i);
> + niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, 0);
> if (!mgcname || !niduuid) {
> rc = -ENOMEM;
> goto out_free;
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-07 20:53:20

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 02/17] staging: lustre: fix bug in osc_enter_cache_try

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> The lustre-release patch commit bdc5bb52c554 ("LU-4933 osc:
> Automatically increase the max_dirty_mb") changed
>
> - if (cli->cl_dirty + PAGE_CACHE_SIZE <= cli->cl_dirty_max &&
> + if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
>
> When this patch landed in Linux a couple of years later, it landed as
>
> - if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
> + if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
>
> which is clearly different ('<=' vs '<'), and allows cl_dirty_pages to
> increase beyond cl_dirty_max_pages - which causes a latter assertion
> to fails.
>
> Fixes: 3147b268400a ("staging: lustre: osc: Automatically increase the max_dirty_mb")
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/include/obd.h | 2 +-
> drivers/staging/lustre/lustre/osc/osc_cache.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
> index 4368f4e9f208..f1233ca7d337 100644
> --- a/drivers/staging/lustre/lustre/include/obd.h
> +++ b/drivers/staging/lustre/lustre/include/obd.h
> @@ -191,7 +191,7 @@ struct client_obd {
> struct sptlrpc_flavor cl_flvr_mgc; /* fixed flavor of mgc->mgs */
>
> /* the grant values are protected by loi_list_lock below */
> - unsigned long cl_dirty_pages; /* all _dirty_ in pahges */
> + unsigned long cl_dirty_pages; /* all _dirty_ in pages */
> unsigned long cl_dirty_max_pages; /* allowed w/o rpc */
> unsigned long cl_dirty_transit; /* dirty synchronous */
> unsigned long cl_avail_grant; /* bytes of credit for ost */
> diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
> index 1c70a504ee89..459503727ce3 100644
> --- a/drivers/staging/lustre/lustre/osc/osc_cache.c
> +++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
> @@ -1529,7 +1529,7 @@ static int osc_enter_cache_try(struct client_obd *cli,
> if (rc < 0)
> return 0;
>
> - if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
> + if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
> atomic_long_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) {
> osc_consume_write_grant(cli, &oap->oap_brw_page);
> if (transient) {
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-07 21:11:21

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 03/17] staging: lustre: statahead: remove incorrect test on agl_list_empty()

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> Including agl_list_empty() in the wait_event_idle() condition
> is pointless as the body of the loop doesn't do anything
> about the agl list.
> So if the list wasn't empty, the while loop would spin
> indefinitely.
>
> The test was removed in the lustre-release commit
> 672ab0e00d61 ("LU-3270 statahead: small fixes and cleanup"),
> but not in the Linux commit 5231f7651c55 ("staging: lustre:
> statahead: small fixes and cleanup").
>
> Fixes: 5231f7651c55 ("staging: lustre: statahead: small fixes and cleanup")
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/llite/statahead.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 6052bfd7ff05..ba00881a5745 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -1124,7 +1124,6 @@ static int ll_statahead_thread(void *arg)
> while (thread_is_running(sa_thread)) {
> wait_event_idle(sa_thread->t_ctl_waitq,
> sa_has_callback(sai) ||
> - !agl_list_empty(sai) ||
> !thread_is_running(sa_thread));
>
> sa_handle_callback(sai);
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-07 21:11:32

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 04/17] staging: lustre: obdclass: don't require lct_owner to be non-NULL.

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> Some places in lu_object.c allow lct_owner to be NULL, implying
> that the code is built in to the kernel (not a module), but
> two places don't. This prevents us from building lustre into
> the kernel.
>
> So remove the requirement and always allow lct_owner to be NULL.
>
> This requires removing an "assert" that the module count is positive,
> but this is redundant as module_put() already does the necessary test.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index cca688175d2d..880800e78c52 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -1380,12 +1380,8 @@ static void key_fini(struct lu_context *ctx, int index)
> lu_ref_del(&key->lct_reference, "ctx", ctx);
> atomic_dec(&key->lct_used);
>
> - if ((ctx->lc_tags & LCT_NOREF) == 0) {
> -#ifdef CONFIG_MODULE_UNLOAD
> - LINVRNT(module_refcount(key->lct_owner) > 0);
> -#endif
> + if ((ctx->lc_tags & LCT_NOREF) == 0)
> module_put(key->lct_owner);
> - }
> ctx->lc_value[index] = NULL;
> }
> }
> @@ -1619,7 +1615,6 @@ static int keys_fill(struct lu_context *ctx)
> LINVRNT(key->lct_init);
> LINVRNT(key->lct_index == i);
>
> - LASSERT(key->lct_owner);
> if (!(ctx->lc_tags & LCT_NOREF) &&
> !try_module_get(key->lct_owner)) {
> /* module is unloading, skip this key */
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-07 21:26:43

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 05/17] staging: lustre: lnet: keep ln_nportals consistent

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> ln_nportals should be zero when no portals have
> been allocated. This ensures that memory allocation failure
> is handled correctly elsewhere.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lnet/lnet/lib-ptl.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index 471f2f6c86f4..fc47379c5938 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -841,6 +841,7 @@ lnet_portals_destroy(void)
>
> cfs_array_free(the_lnet.ln_portals);
> the_lnet.ln_portals = NULL;
> + the_lnet.ln_nportals = 0;
> }
>
> int
> @@ -851,12 +852,12 @@ lnet_portals_create(void)
>
> size = offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]);
>
> - the_lnet.ln_nportals = MAX_PORTALS;
> - the_lnet.ln_portals = cfs_array_alloc(the_lnet.ln_nportals, size);
> + the_lnet.ln_portals = cfs_array_alloc(MAX_PORTALS, size);
> if (!the_lnet.ln_portals) {
> CERROR("Failed to allocate portals table\n");
> return -ENOMEM;
> }
> + the_lnet.ln_nportals = MAX_PORTALS;
>
> for (i = 0; i < the_lnet.ln_nportals; i++) {
> if (lnet_ptl_setup(the_lnet.ln_portals[i], i)) {
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 00:20:18

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 06/17] staging: lustre: get entropy from nid when nid set.

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> When the 'lustre' module is loaded, it gets a list of
> net devices and uses the node ids to add entropy
> to the prng. This means that the network interfaces need
> to be configured before the module is loaded, which prevents
> the module from being compiled into a monolithic kernel.
>
> So move this entropy addition to the moment when
> the interface is imported to LNet and the node id is first known.

It took me a while to convince myself this is correct, but this is
moving the entropy addition earlier in the startup sequence, and
that is a good thing. The important factor is to ensure that the
client UUID (generated at mount time) is unique across all clients,
and adding the node address to the entropy ensures this, even if many
thousands of identical diskless nodes boot and mount simultaneously.

Reviewed-by: Andreas Dilger <[email protected]>

> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lnet/lnet/api-ni.c | 7 +++++++
> drivers/staging/lustre/lustre/llite/super25.c | 17 +----------------
> 2 files changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index 48d25ccadbb3..90266be0132d 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -1214,6 +1214,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
> struct lnet_lnd *lnd;
> struct lnet_tx_queue *tq;
> int i;
> + u32 seed;
>
> lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
>
> @@ -1352,6 +1353,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
> tq->tq_credits = lnet_ni_tq_credits(ni);
> }
>
> + /* Nodes with small feet have little entropy. The NID for this
> + * node gives the most entropy in the low bits.
> + */
> + seed = LNET_NIDADDR(ni->ni_nid);
> + add_device_randomness(&seed, sizeof(seed));
> +
> CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
> libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
> lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
> diff --git a/drivers/staging/lustre/lustre/llite/super25.c b/drivers/staging/lustre/lustre/llite/super25.c
> index 9b0bb3541a84..861e7a60f408 100644
> --- a/drivers/staging/lustre/lustre/llite/super25.c
> +++ b/drivers/staging/lustre/lustre/llite/super25.c
> @@ -85,8 +85,7 @@ MODULE_ALIAS_FS("lustre");
>
> static int __init lustre_init(void)
> {
> - struct lnet_process_id lnet_id;
> - int i, rc;
> + int rc;
>
> BUILD_BUG_ON(sizeof(LUSTRE_VOLATILE_HDR) !=
> LUSTRE_VOLATILE_HDR_LEN + 1);
> @@ -125,20 +124,6 @@ static int __init lustre_init(void)
> goto out_debugfs;
> }
>
> - /* Nodes with small feet have little entropy. The NID for this
> - * node gives the most entropy in the low bits
> - */
> - for (i = 0;; i++) {
> - u32 seed;
> -
> - if (LNetGetId(i, &lnet_id) == -ENOENT)
> - break;
> - if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) {
> - seed = LNET_NIDADDR(lnet_id.nid);
> - add_device_randomness(&seed, sizeof(seed));
> - }
> - }
> -
> rc = vvp_global_init();
> if (rc != 0)
> goto out_sysfs;
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 00:21:22

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 07/17] staging: lustre: ptlrpc: change GFP_NOFS to GFP_KERNEL

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> These allocations are performed during initialization,
> so they don't need GFP_NOFS.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c | 2 +-
> drivers/staging/lustre/lustre/ptlrpc/service.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> index 577c5822b823..625b9520d78f 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> @@ -377,7 +377,7 @@ static inline void enc_pools_alloc(void)
> page_pools.epp_pools =
> kvzalloc(page_pools.epp_max_pools *
> sizeof(*page_pools.epp_pools),
> - GFP_NOFS);
> + GFP_KERNEL);
> }
>
> static inline void enc_pools_free(void)
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
> index 49417228b621..f37364e00dfe 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/service.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
> @@ -2046,7 +2046,7 @@ static int ptlrpc_main(void *arg)
> goto out;
> }
>
> - env = kzalloc(sizeof(*env), GFP_NOFS);
> + env = kzalloc(sizeof(*env), GFP_KERNEL);
> if (!env) {
> rc = -ENOMEM;
> goto out_srv_fini;
> @@ -2072,7 +2072,7 @@ static int ptlrpc_main(void *arg)
> }
>
> /* Alloc reply state structure for this one */
> - rs = kvzalloc(svc->srv_max_reply_size, GFP_NOFS);
> + rs = kvzalloc(svc->srv_max_reply_size, GFP_KERNEL);
> if (!rs) {
> rc = -ENOMEM;
> goto out_srv_fini;
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 00:28:38

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 08/17] staging: lustre: obdclass: use workqueue for zombie management.

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> obdclass currently maintains two lists of data structures
> (imports and exports), and a kthread which will free
> anything on either list. The thread is woken whenever
> anything is added to either list.
>
> This is exactly the sort of thing that workqueues exist for.
>
> So discard the zombie kthread and the lists and locks, and
> create a single workqueue. Each obd_import and obd_export
> gets a work_struct to attach to this workqueue.
>
> This requires a small change to import_sec_validate_get()
> which was testing if an obd_import was on the zombie
> list. This cannot have every safely found it to be
> on the list (as it could be freed asynchronously)
> so it must be dead code.
>
> We could use system_wq instead of creating a dedicated
> zombie_wq, but as we occasionally want to flush all pending
> work, it is a little nicer to only have to wait for our own
> work items.

Nice cleanup. Lustre definitely has too many threads, but
kernel work queues didn't exist in the dark ages.

I CC'd Alexey, since he wrote this code initially, in case
there is anything special to be aware of.

Reviewed-by: Andreas Dilger <[email protected]>

> Signed-off-by: NeilBrown <[email protected]>
> ---
> .../staging/lustre/lustre/include/lustre_export.h | 2
> .../staging/lustre/lustre/include/lustre_import.h | 4
> drivers/staging/lustre/lustre/obdclass/genops.c | 193 ++------------------
> drivers/staging/lustre/lustre/ptlrpc/sec.c | 6 -
> 4 files changed, 30 insertions(+), 175 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h b/drivers/staging/lustre/lustre/include/lustre_export.h
> index 66ac9dc7302a..40cd168ed2ea 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_export.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_export.h
> @@ -87,6 +87,8 @@ struct obd_export {
> struct obd_uuid exp_client_uuid;
> /** To link all exports on an obd device */
> struct list_head exp_obd_chain;
> + /** work_struct for destruction of export */
> + struct work_struct exp_zombie_work;
> struct hlist_node exp_uuid_hash; /** uuid-export hash*/
> /** Obd device of this export */
> struct obd_device *exp_obd;
> diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
> index ea158e0630e2..1731048f1ff2 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_import.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_import.h
> @@ -162,8 +162,8 @@ struct obd_import {
> struct ptlrpc_client *imp_client;
> /** List element for linking into pinger chain */
> struct list_head imp_pinger_chain;
> - /** List element for linking into chain for destruction */
> - struct list_head imp_zombie_chain;
> + /** work struct for destruction of import */
> + struct work_struct imp_zombie_work;
>
> /**
> * Lists of requests that are retained for replay, waiting for a reply,
> diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
> index 8f776a4058a9..63ccbabb4c5a 100644
> --- a/drivers/staging/lustre/lustre/obdclass/genops.c
> +++ b/drivers/staging/lustre/lustre/obdclass/genops.c
> @@ -48,10 +48,7 @@ struct kmem_cache *obdo_cachep;
> EXPORT_SYMBOL(obdo_cachep);
> static struct kmem_cache *import_cachep;
>
> -static struct list_head obd_zombie_imports;
> -static struct list_head obd_zombie_exports;
> -static spinlock_t obd_zombie_impexp_lock;
> -static void obd_zombie_impexp_notify(void);
> +static struct workqueue_struct *zombie_wq;
> static void obd_zombie_export_add(struct obd_export *exp);
> static void obd_zombie_import_add(struct obd_import *imp);
>
> @@ -701,6 +698,13 @@ void class_export_put(struct obd_export *exp)
> }
> EXPORT_SYMBOL(class_export_put);
>
> +static void obd_zombie_exp_cull(struct work_struct *ws)
> +{
> + struct obd_export *export = container_of(ws, struct obd_export, exp_zombie_work);
> +
> + class_export_destroy(export);
> +}
> +
> /* Creates a new export, adds it to the hash table, and returns a
> * pointer to it. The refcount is 2: one for the hash reference, and
> * one for the pointer returned by this function.
> @@ -741,6 +745,7 @@ struct obd_export *class_new_export(struct obd_device *obd,
> INIT_HLIST_NODE(&export->exp_uuid_hash);
> spin_lock_init(&export->exp_bl_list_lock);
> INIT_LIST_HEAD(&export->exp_bl_list);
> + INIT_WORK(&export->exp_zombie_work, obd_zombie_exp_cull);
>
> export->exp_sp_peer = LUSTRE_SP_ANY;
> export->exp_flvr.sf_rpc = SPTLRPC_FLVR_INVALID;
> @@ -862,7 +867,6 @@ EXPORT_SYMBOL(class_import_get);
>
> void class_import_put(struct obd_import *imp)
> {
> - LASSERT(list_empty(&imp->imp_zombie_chain));
> LASSERT_ATOMIC_GT_LT(&imp->imp_refcount, 0, LI_POISON);
>
> CDEBUG(D_INFO, "import %p refcount=%d obd=%s\n", imp,
> @@ -894,6 +898,13 @@ static void init_imp_at(struct imp_at *at)
> }
> }
>
> +static void obd_zombie_imp_cull(struct work_struct *ws)
> +{
> + struct obd_import *import = container_of(ws, struct obd_import, imp_zombie_work);
> +
> + class_import_destroy(import);
> +}
> +
> struct obd_import *class_new_import(struct obd_device *obd)
> {
> struct obd_import *imp;
> @@ -903,7 +914,6 @@ struct obd_import *class_new_import(struct obd_device *obd)
> return NULL;
>
> INIT_LIST_HEAD(&imp->imp_pinger_chain);
> - INIT_LIST_HEAD(&imp->imp_zombie_chain);
> INIT_LIST_HEAD(&imp->imp_replay_list);
> INIT_LIST_HEAD(&imp->imp_sending_list);
> INIT_LIST_HEAD(&imp->imp_delayed_list);
> @@ -917,6 +927,7 @@ struct obd_import *class_new_import(struct obd_device *obd)
> imp->imp_obd = class_incref(obd, "import", imp);
> mutex_init(&imp->imp_sec_mutex);
> init_waitqueue_head(&imp->imp_recovery_waitq);
> + INIT_WORK(&imp->imp_zombie_work, obd_zombie_imp_cull);
>
> atomic_set(&imp->imp_refcount, 2);
> atomic_set(&imp->imp_unregistering, 0);
> @@ -1098,81 +1109,6 @@ EXPORT_SYMBOL(class_fail_export);
> void (*class_export_dump_hook)(struct obd_export *) = NULL;
> #endif
>
> -/* Total amount of zombies to be destroyed */
> -static int zombies_count;
> -
> -/**
> - * kill zombie imports and exports
> - */
> -static void obd_zombie_impexp_cull(void)
> -{
> - struct obd_import *import;
> - struct obd_export *export;
> -
> - do {
> - spin_lock(&obd_zombie_impexp_lock);
> -
> - import = NULL;
> - if (!list_empty(&obd_zombie_imports)) {
> - import = list_entry(obd_zombie_imports.next,
> - struct obd_import,
> - imp_zombie_chain);
> - list_del_init(&import->imp_zombie_chain);
> - }
> -
> - export = NULL;
> - if (!list_empty(&obd_zombie_exports)) {
> - export = list_entry(obd_zombie_exports.next,
> - struct obd_export,
> - exp_obd_chain);
> - list_del_init(&export->exp_obd_chain);
> - }
> -
> - spin_unlock(&obd_zombie_impexp_lock);
> -
> - if (import) {
> - class_import_destroy(import);
> - spin_lock(&obd_zombie_impexp_lock);
> - zombies_count--;
> - spin_unlock(&obd_zombie_impexp_lock);
> - }
> -
> - if (export) {
> - class_export_destroy(export);
> - spin_lock(&obd_zombie_impexp_lock);
> - zombies_count--;
> - spin_unlock(&obd_zombie_impexp_lock);
> - }
> -
> - cond_resched();
> - } while (import || export);
> -}
> -
> -static struct completion obd_zombie_start;
> -static struct completion obd_zombie_stop;
> -static unsigned long obd_zombie_flags;
> -static wait_queue_head_t obd_zombie_waitq;
> -static pid_t obd_zombie_pid;
> -
> -enum {
> - OBD_ZOMBIE_STOP = 0x0001,
> -};
> -
> -/**
> - * check for work for kill zombie import/export thread.
> - */
> -static int obd_zombie_impexp_check(void *arg)
> -{
> - int rc;
> -
> - spin_lock(&obd_zombie_impexp_lock);
> - rc = (zombies_count == 0) &&
> - !test_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags);
> - spin_unlock(&obd_zombie_impexp_lock);
> -
> - return rc;
> -}
> -
> /**
> * Add export to the obd_zombie thread and notify it.
> */
> @@ -1182,12 +1118,7 @@ static void obd_zombie_export_add(struct obd_export *exp)
> LASSERT(!list_empty(&exp->exp_obd_chain));
> list_del_init(&exp->exp_obd_chain);
> spin_unlock(&exp->exp_obd->obd_dev_lock);
> - spin_lock(&obd_zombie_impexp_lock);
> - zombies_count++;
> - list_add(&exp->exp_obd_chain, &obd_zombie_exports);
> - spin_unlock(&obd_zombie_impexp_lock);
> -
> - obd_zombie_impexp_notify();
> + queue_work(zombie_wq, &exp->exp_zombie_work);
> }
>
> /**
> @@ -1196,40 +1127,7 @@ static void obd_zombie_export_add(struct obd_export *exp)
> static void obd_zombie_import_add(struct obd_import *imp)
> {
> LASSERT(!imp->imp_sec);
> - spin_lock(&obd_zombie_impexp_lock);
> - LASSERT(list_empty(&imp->imp_zombie_chain));
> - zombies_count++;
> - list_add(&imp->imp_zombie_chain, &obd_zombie_imports);
> - spin_unlock(&obd_zombie_impexp_lock);
> -
> - obd_zombie_impexp_notify();
> -}
> -
> -/**
> - * notify import/export destroy thread about new zombie.
> - */
> -static void obd_zombie_impexp_notify(void)
> -{
> - /*
> - * Make sure obd_zombie_impexp_thread get this notification.
> - * It is possible this signal only get by obd_zombie_barrier, and
> - * barrier gulps this notification and sleeps away and hangs ensues
> - */
> - wake_up_all(&obd_zombie_waitq);
> -}
> -
> -/**
> - * check whether obd_zombie is idle
> - */
> -static int obd_zombie_is_idle(void)
> -{
> - int rc;
> -
> - LASSERT(!test_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags));
> - spin_lock(&obd_zombie_impexp_lock);
> - rc = (zombies_count == 0);
> - spin_unlock(&obd_zombie_impexp_lock);
> - return rc;
> + queue_work(zombie_wq, &imp->imp_zombie_work);
> }
>
> /**
> @@ -1237,60 +1135,19 @@ static int obd_zombie_is_idle(void)
> */
> void obd_zombie_barrier(void)
> {
> - if (obd_zombie_pid == current_pid())
> - /* don't wait for myself */
> - return;
> - wait_event_idle(obd_zombie_waitq, obd_zombie_is_idle());
> + flush_workqueue(zombie_wq);
> }
> EXPORT_SYMBOL(obd_zombie_barrier);
>
> -/**
> - * destroy zombie export/import thread.
> - */
> -static int obd_zombie_impexp_thread(void *unused)
> -{
> - unshare_fs_struct();
> - complete(&obd_zombie_start);
> -
> - obd_zombie_pid = current_pid();
> -
> - while (!test_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags)) {
> - wait_event_idle(obd_zombie_waitq,
> - !obd_zombie_impexp_check(NULL));
> - obd_zombie_impexp_cull();
> -
> - /*
> - * Notify obd_zombie_barrier callers that queues
> - * may be empty.
> - */
> - wake_up(&obd_zombie_waitq);
> - }
> -
> - complete(&obd_zombie_stop);
> -
> - return 0;
> -}
> -
> /**
> * start destroy zombie import/export thread
> */
> int obd_zombie_impexp_init(void)
> {
> - struct task_struct *task;
> -
> - INIT_LIST_HEAD(&obd_zombie_imports);
> - INIT_LIST_HEAD(&obd_zombie_exports);
> - spin_lock_init(&obd_zombie_impexp_lock);
> - init_completion(&obd_zombie_start);
> - init_completion(&obd_zombie_stop);
> - init_waitqueue_head(&obd_zombie_waitq);
> - obd_zombie_pid = 0;
> -
> - task = kthread_run(obd_zombie_impexp_thread, NULL, "obd_zombid");
> - if (IS_ERR(task))
> - return PTR_ERR(task);
> + zombie_wq = alloc_workqueue("obd_zombid", 0, 0);
> + if (!zombie_wq)
> + return -ENOMEM;
>
> - wait_for_completion(&obd_zombie_start);
> return 0;
> }
>
> @@ -1299,9 +1156,7 @@ int obd_zombie_impexp_init(void)
> */
> void obd_zombie_impexp_stop(void)
> {
> - set_bit(OBD_ZOMBIE_STOP, &obd_zombie_flags);
> - obd_zombie_impexp_notify();
> - wait_for_completion(&obd_zombie_stop);
> + destroy_workqueue(zombie_wq);
> }
>
> struct obd_request_slot_waiter {
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec.c b/drivers/staging/lustre/lustre/ptlrpc/sec.c
> index f152ba1af0fc..3cb1e075f077 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec.c
> @@ -339,11 +339,9 @@ static int import_sec_validate_get(struct obd_import *imp,
> }
>
> *sec = sptlrpc_import_sec_ref(imp);
> - /* Only output an error when the import is still active */
> if (!*sec) {
> - if (list_empty(&imp->imp_zombie_chain))
> - CERROR("import %p (%s) with no sec\n",
> - imp, ptlrpc_import_state_name(imp->imp_state));
> + CERROR("import %p (%s) with no sec\n",
> + imp, ptlrpc_import_state_name(imp->imp_state));
> return -EACCES;
> }
>
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 19:24:15

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 09/17] staging: lustre: ldlm: use delayed_work for pools_recalc

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> ldlm currenty has a kthread which wakes up every so often
> and calls ldlm_pools_recalc().
> The thread is started and stopped, but no other external interactions
> happen.
>
> This can trivially be replaced by a delayed_work if we have
> ldlm_pools_recalc() reschedule the work rather than just report
> when to do that.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_pool.c | 99 +++---------------------
> 1 file changed, 11 insertions(+), 88 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> index a0e486b57e08..53b8f33e54b5 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> @@ -784,9 +784,6 @@ static int ldlm_pool_granted(struct ldlm_pool *pl)
> return atomic_read(&pl->pl_granted);
> }
>
> -static struct ptlrpc_thread *ldlm_pools_thread;
> -static struct completion ldlm_pools_comp;
> -
> /*
> * count locks from all namespaces (if possible). Returns number of
> * cached locks.
> @@ -899,8 +896,12 @@ static unsigned long ldlm_pools_cli_scan(struct shrinker *s,
> sc->gfp_mask);
> }
>
> -static int ldlm_pools_recalc(enum ldlm_side client)
> +static void ldlm_pools_recalc(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(ldlm_recalc_pools, ldlm_pools_recalc);
> +
> +static void ldlm_pools_recalc(struct work_struct *ws)
> {
> + enum ldlm_side client = LDLM_NAMESPACE_CLIENT;
> struct ldlm_namespace *ns;
> struct ldlm_namespace *ns_old = NULL;
> /* seconds of sleep if no active namespaces */
> @@ -982,92 +983,19 @@ static int ldlm_pools_recalc(enum ldlm_side client)
> /* Wake up the blocking threads from time to time. */
> ldlm_bl_thread_wakeup();
>
> - return time;
> -}
> -
> -static int ldlm_pools_thread_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = (struct ptlrpc_thread *)arg;
> - int c_time;
> -
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(&thread->t_ctl_waitq);
> -
> - CDEBUG(D_DLMTRACE, "%s: pool thread starting, process %d\n",
> - "ldlm_poold", current_pid());
> -
> - while (1) {
> - /*
> - * Recal all pools on this tick.
> - */
> - c_time = ldlm_pools_recalc(LDLM_NAMESPACE_CLIENT);
> -
> - /*
> - * Wait until the next check time, or until we're
> - * stopped.
> - */
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread) ||
> - thread_is_event(thread),
> - c_time * HZ);
> -
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> - thread_test_and_clear_flags(thread, SVC_EVENT);
> - }
> -
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(&thread->t_ctl_waitq);
> -
> - CDEBUG(D_DLMTRACE, "%s: pool thread exiting, process %d\n",
> - "ldlm_poold", current_pid());
> -
> - complete_and_exit(&ldlm_pools_comp, 0);
> + schedule_delayed_work(&ldlm_recalc_pools, time * HZ);
> }
>
> static int ldlm_pools_thread_start(void)
> {
> - struct task_struct *task;
> -
> - if (ldlm_pools_thread)
> - return -EALREADY;
> -
> - ldlm_pools_thread = kzalloc(sizeof(*ldlm_pools_thread), GFP_NOFS);
> - if (!ldlm_pools_thread)
> - return -ENOMEM;
> -
> - init_completion(&ldlm_pools_comp);
> - init_waitqueue_head(&ldlm_pools_thread->t_ctl_waitq);
> + schedule_delayed_work(&ldlm_recalc_pools, 0);
>
> - task = kthread_run(ldlm_pools_thread_main, ldlm_pools_thread,
> - "ldlm_poold");
> - if (IS_ERR(task)) {
> - CERROR("Can't start pool thread, error %ld\n", PTR_ERR(task));
> - kfree(ldlm_pools_thread);
> - ldlm_pools_thread = NULL;
> - return PTR_ERR(task);
> - }
> - wait_event_idle(ldlm_pools_thread->t_ctl_waitq,
> - thread_is_running(ldlm_pools_thread));
> return 0;
> }
>
> static void ldlm_pools_thread_stop(void)
> {
> - if (!ldlm_pools_thread)
> - return;
> -
> - thread_set_flags(ldlm_pools_thread, SVC_STOPPING);
> - wake_up(&ldlm_pools_thread->t_ctl_waitq);
> -
> - /*
> - * Make sure that pools thread is finished before freeing @thread.
> - * This fixes possible race and oops due to accessing freed memory
> - * in pools thread.
> - */
> - wait_for_completion(&ldlm_pools_comp);
> - kfree(ldlm_pools_thread);
> - ldlm_pools_thread = NULL;
> + cancel_delayed_work_sync(&ldlm_recalc_pools);
> }
>
> static struct shrinker ldlm_pools_cli_shrinker = {
> @@ -1081,20 +1009,15 @@ int ldlm_pools_init(void)
> int rc;
>
> rc = ldlm_pools_thread_start();
> - if (rc)
> - return rc;
> -
> - rc = register_shrinker(&ldlm_pools_cli_shrinker);
> - if (rc)
> - ldlm_pools_thread_stop();
> + if (!rc)
> + rc = register_shrinker(&ldlm_pools_cli_shrinker);
>
> return rc;
> }
>
> void ldlm_pools_fini(void)
> {
> - if (ldlm_pools_thread)
> - unregister_shrinker(&ldlm_pools_cli_shrinker);
> + unregister_shrinker(&ldlm_pools_cli_shrinker);
>
> ldlm_pools_thread_stop();
> }
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 19:25:33

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 10/17] staging: lustre: ptlrpc: use delayed_work in sec_gc

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> The garbage collection for security contexts currently
> has a dedicated kthread which wakes up every 30 minutes
> to discard old garbage.
>
> Replace this with a simple delayed_work item on the
> system work queue.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/ptlrpc/sec_gc.c | 90 ++++++++-----------------
> 1 file changed, 28 insertions(+), 62 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> index 48f1a72afd77..2c8bad7b7877 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> @@ -55,7 +55,6 @@ static spinlock_t sec_gc_list_lock;
> static LIST_HEAD(sec_gc_ctx_list);
> static spinlock_t sec_gc_ctx_list_lock;
>
> -static struct ptlrpc_thread sec_gc_thread;
> static atomic_t sec_gc_wait_del = ATOMIC_INIT(0);
>
> void sptlrpc_gc_add_sec(struct ptlrpc_sec *sec)
> @@ -139,86 +138,53 @@ static void sec_do_gc(struct ptlrpc_sec *sec)
> sec->ps_gc_next = ktime_get_real_seconds() + sec->ps_gc_interval;
> }
>
> -static int sec_gc_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = arg;
> -
> - unshare_fs_struct();
> +static void sec_gc_main(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(sec_gc_work, sec_gc_main);
>
> - /* Record that the thread is running */
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(&thread->t_ctl_waitq);
> -
> - while (1) {
> - struct ptlrpc_sec *sec;
> +static void sec_gc_main(struct work_struct *ws)
> +{
> + struct ptlrpc_sec *sec;
>
> - sec_process_ctx_list();
> + sec_process_ctx_list();
> again:
> - /* go through sec list do gc.
> - * FIXME here we iterate through the whole list each time which
> - * is not optimal. we perhaps want to use balanced binary tree
> - * to trace each sec as order of expiry time.
> - * another issue here is we wakeup as fixed interval instead of
> - * according to each sec's expiry time
> + /* go through sec list do gc.
> + * FIXME here we iterate through the whole list each time which
> + * is not optimal. we perhaps want to use balanced binary tree
> + * to trace each sec as order of expiry time.
> + * another issue here is we wakeup as fixed interval instead of
> + * according to each sec's expiry time
> + */
> + mutex_lock(&sec_gc_mutex);
> + list_for_each_entry(sec, &sec_gc_list, ps_gc_list) {
> + /* if someone is waiting to be deleted, let it
> + * proceed as soon as possible.
> */
> - mutex_lock(&sec_gc_mutex);
> - list_for_each_entry(sec, &sec_gc_list, ps_gc_list) {
> - /* if someone is waiting to be deleted, let it
> - * proceed as soon as possible.
> - */
> - if (atomic_read(&sec_gc_wait_del)) {
> - CDEBUG(D_SEC, "deletion pending, start over\n");
> - mutex_unlock(&sec_gc_mutex);
> - goto again;
> - }
> -
> - sec_do_gc(sec);
> + if (atomic_read(&sec_gc_wait_del)) {
> + CDEBUG(D_SEC, "deletion pending, start over\n");
> + mutex_unlock(&sec_gc_mutex);
> + goto again;
> }
> - mutex_unlock(&sec_gc_mutex);
> -
> - /* check ctx list again before sleep */
> - sec_process_ctx_list();
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread),
> - SEC_GC_INTERVAL * HZ);
>
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> + sec_do_gc(sec);
> }
> + mutex_unlock(&sec_gc_mutex);
>
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(&thread->t_ctl_waitq);
> - return 0;
> + /* check ctx list again before sleep */
> + sec_process_ctx_list();
> + schedule_delayed_work(&sec_gc_work, SEC_GC_INTERVAL * HZ);
> }
>
> int sptlrpc_gc_init(void)
> {
> - struct task_struct *task;
> -
> mutex_init(&sec_gc_mutex);
> spin_lock_init(&sec_gc_list_lock);
> spin_lock_init(&sec_gc_ctx_list_lock);
>
> - /* initialize thread control */
> - memset(&sec_gc_thread, 0, sizeof(sec_gc_thread));
> - init_waitqueue_head(&sec_gc_thread.t_ctl_waitq);
> -
> - task = kthread_run(sec_gc_main, &sec_gc_thread, "sptlrpc_gc");
> - if (IS_ERR(task)) {
> - CERROR("can't start gc thread: %ld\n", PTR_ERR(task));
> - return PTR_ERR(task);
> - }
> -
> - wait_event_idle(sec_gc_thread.t_ctl_waitq,
> - thread_is_running(&sec_gc_thread));
> + schedule_delayed_work(&sec_gc_work, 0);
> return 0;
> }
>
> void sptlrpc_gc_fini(void)
> {
> - thread_set_flags(&sec_gc_thread, SVC_STOPPING);
> - wake_up(&sec_gc_thread.t_ctl_waitq);
> -
> - wait_event_idle(sec_gc_thread.t_ctl_waitq,
> - thread_is_stopped(&sec_gc_thread));
> + cancel_delayed_work_sync(&sec_gc_work);
> }
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 23:55:10

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 11/17] staging: lustre: ptlrpc: use workqueue for pinger

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> lustre has a "Pinger" kthread which periodically pings peers
> to ensure all hosts are functioning.
>
> This can more easily be done using a work queue.
>
> As maintaining contact with other peers is import for
> keeping the filesystem running, and as the filesystem might
> be involved in freeing memory, it is safest to have a
> separate WQ_MEM_RECLAIM workqueue.
>
> The SVC_EVENT functionality to wake up the thread can be
> replaced with mod_delayed_work().
>
> Also use round_jiffies_up_relative() rather than setting a
> minimum of 1 second delay. The PING_INTERVAL is measured in
> seconds so this meets the need is allow the workqueue to
> keep wakeups synchronized.
>
> Signed-off-by: NeilBrown <[email protected]>

Looks reasonable. Fortunately, pinging the server does not need
to be very accurate since it is only done occasionally when the
client is otherwise idle, so it shouldn't matter if the workqueue
operation is delayed by a few seconds.

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/ptlrpc/pinger.c | 81 +++++++------------------
> 1 file changed, 24 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> index b5f3cfee8e75..0775b7a048bb 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> @@ -217,21 +217,18 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
> }
> }
>
> -static int ptlrpc_pinger_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = arg;
> -
> - /* Record that the thread is running */
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(&thread->t_ctl_waitq);
> +static struct workqueue_struct *pinger_wq;
> +static void ptlrpc_pinger_main(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>
> - /* And now, loop forever, pinging as needed. */
> - while (1) {
> - unsigned long this_ping = cfs_time_current();
> - long time_to_next_wake;
> - struct timeout_item *item;
> - struct obd_import *imp;
> +static void ptlrpc_pinger_main(struct work_struct *ws)
> +{
> + unsigned long this_ping = cfs_time_current();
> + long time_to_next_wake;
> + struct timeout_item *item;
> + struct obd_import *imp;
>
> + do {
> mutex_lock(&pinger_mutex);
> list_for_each_entry(item, &timeout_list, ti_chain) {
> item->ti_cb(item, item->ti_cb_data);
> @@ -260,50 +257,24 @@ static int ptlrpc_pinger_main(void *arg)
> time_to_next_wake,
> cfs_time_add(this_ping,
> PING_INTERVAL * HZ));
> - if (time_to_next_wake > 0) {
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread) ||
> - thread_is_event(thread),
> - max_t(long, time_to_next_wake, HZ));
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> - /* woken after adding import to reset timer */
> - thread_test_and_clear_flags(thread, SVC_EVENT);
> - }
> - }
> + } while (time_to_next_wake <= 0);
>
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(&thread->t_ctl_waitq);
> -
> - CDEBUG(D_NET, "pinger thread exiting, process %d\n", current_pid());
> - return 0;
> + queue_delayed_work(pinger_wq, &ping_work,
> + round_jiffies_up_relative(time_to_next_wake));
> }
>
> -static struct ptlrpc_thread pinger_thread;
> -
> int ptlrpc_start_pinger(void)
> {
> - struct task_struct *task;
> - int rc;
> -
> - if (!thread_is_init(&pinger_thread) &&
> - !thread_is_stopped(&pinger_thread))
> + if (pinger_wq)
> return -EALREADY;
>
> - init_waitqueue_head(&pinger_thread.t_ctl_waitq);
> -
> - strcpy(pinger_thread.t_name, "ll_ping");
> -
> - task = kthread_run(ptlrpc_pinger_main, &pinger_thread,
> - pinger_thread.t_name);
> - if (IS_ERR(task)) {
> - rc = PTR_ERR(task);
> - CERROR("cannot start pinger thread: rc = %d\n", rc);
> - return rc;
> + pinger_wq = alloc_workqueue("ptlrpc_pinger", WQ_MEM_RECLAIM, 1);
> + if (!pinger_wq) {
> + CERROR("cannot start pinger workqueue\n");
> + return -ENOMEM;
> }
> - wait_event_idle(pinger_thread.t_ctl_waitq,
> - thread_is_running(&pinger_thread));
>
> + queue_delayed_work(pinger_wq, &ping_work, 0);
> return 0;
> }
>
> @@ -313,16 +284,13 @@ int ptlrpc_stop_pinger(void)
> {
> int rc = 0;
>
> - if (thread_is_init(&pinger_thread) ||
> - thread_is_stopped(&pinger_thread))
> + if (!pinger_wq)
> return -EALREADY;
>
> ptlrpc_pinger_remove_timeouts();
> - thread_set_flags(&pinger_thread, SVC_STOPPING);
> - wake_up(&pinger_thread.t_ctl_waitq);
> -
> - wait_event_idle(pinger_thread.t_ctl_waitq,
> - thread_is_stopped(&pinger_thread));
> + cancel_delayed_work_sync(&ping_work);
> + destroy_workqueue(pinger_wq);
> + pinger_wq = NULL;
>
> return rc;
> }
> @@ -505,6 +473,5 @@ static int ptlrpc_pinger_remove_timeouts(void)
>
> void ptlrpc_pinger_wake_up(void)
> {
> - thread_add_flags(&pinger_thread, SVC_EVENT);
> - wake_up(&pinger_thread.t_ctl_waitq);
> + mod_delayed_work(pinger_wq, &ping_work, 0);
> }
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-08 23:56:48

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 12/17] staging: lustre: remove unused flag from ptlrpc_thread

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> SVC_EVENT is no longer used.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lustre/include/lustre_net.h | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
> index 5a4434e7c85a..108683c54127 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> @@ -1259,7 +1259,6 @@ enum {
> SVC_STOPPING = 1 << 1,
> SVC_STARTING = 1 << 2,
> SVC_RUNNING = 1 << 3,
> - SVC_EVENT = 1 << 4,
> };
>
> #define PTLRPC_THR_NAME_LEN 32
> @@ -1302,11 +1301,6 @@ struct ptlrpc_thread {
> char t_name[PTLRPC_THR_NAME_LEN];
> };
>
> -static inline int thread_is_init(struct ptlrpc_thread *thread)
> -{
> - return thread->t_flags == 0;
> -}
> -
> static inline int thread_is_stopped(struct ptlrpc_thread *thread)
> {
> return !!(thread->t_flags & SVC_STOPPED);
> @@ -1327,11 +1321,6 @@ static inline int thread_is_running(struct ptlrpc_thread *thread)
> return !!(thread->t_flags & SVC_RUNNING);
> }
>
> -static inline int thread_is_event(struct ptlrpc_thread *thread)
> -{
> - return !!(thread->t_flags & SVC_EVENT);
> -}
> -
> static inline void thread_clear_flags(struct ptlrpc_thread *thread, __u32 flags)
> {
> thread->t_flags &= ~flags;
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-09 00:13:29

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 13/17] staging: lustre: remove 'ptlrpc_thread usage' for sai_agl_thread

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> Lustre has a 'struct ptlrpc_thread' which provides
> control functionality wrapped around kthreads.
> None of the functionality used in statahead.c requires
> ptlrcp_thread - it can all be done directly with kthreads.
>
> So discard the ptlrpc_thread and just use a task_struct directly.
>
> One particular change worth noting is that in the current
> code, the thread performs some start-up actions and then
> signals that it is ready to go. In the new code, the thread
> is first created, then the startup actions are perform, then
> the thread is woken up. This means there is no need to wait
> any more than kthread_create() already waits.
>
> Signed-off-by: NeilBrown <[email protected]>

Looks reasonable, but one minor comment inline below. Not enough to
make me think the patch is bad, just a minor inefficiency...

Reviewed-by: Andreas Dilger <[email protected]>

I've also CC'd Fan Yong, who is the author of this code in case
he has any comments.

> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index ba00881a5745..39241b952bf4 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -861,35 +860,13 @@ static int ll_agl_thread(void *arg)
> struct inode *dir = d_inode(parent);
> struct ll_inode_info *plli = ll_i2info(dir);
> struct ll_inode_info *clli;
> - struct ll_sb_info *sbi = ll_i2sbi(dir);
> - struct ll_statahead_info *sai;
> - struct ptlrpc_thread *thread;
> + /* We already own this reference, so it is safe to take it without a lock. */
> + struct ll_statahead_info *sai = plli->lli_sai;
>
> - sai = ll_sai_get(dir);

Here we used to grab a reference to "sai" from the directory, but you
get it in the calling thread now...

> @@ -937,16 +917,22 @@ static void ll_start_agl(struct dentry *parent, struct ll_statahead_info *sai)
> sai, parent);
>
> plli = ll_i2info(d_inode(parent));
> + task = kthread_create(ll_agl_thread, parent, "ll_agl_%u",
> + plli->lli_opendir_pid);
> if (IS_ERR(task)) {
> CERROR("can't start ll_agl thread, rc: %ld\n", PTR_ERR(task));
> return;
> }
>
> + sai->sai_agl_task = task;
> + atomic_inc(&ll_i2sbi(d_inode(parent))->ll_agl_total);
> + spin_lock(&plli->lli_agl_lock);
> + sai->sai_agl_valid = 1;
> + spin_unlock(&plli->lli_agl_lock);
> + /* Get an extra reference that the thread holds */
> + ll_sai_get(d_inode(parent));

Here you get the extra reference, but we already have the pointer to
"sai", do going through "parent->d_inode->lli->lli_sai" to get "sai"
again seems convoluted. One option is atomic_inc(&sai->sai_refcount),
but given that this is done only once per "ls" call I don't think it
is a huge deal, and not more work than was done before.

Cheers, Andreas

> +
> + wake_up_process(task);
> }
>
> /* statahead thread main function */
> @@ -958,7 +944,6 @@ static int ll_statahead_thread(void *arg)
> struct ll_sb_info *sbi = ll_i2sbi(dir);
> struct ll_statahead_info *sai;
> struct ptlrpc_thread *sa_thread;
> - struct ptlrpc_thread *agl_thread;
> struct page *page = NULL;
> __u64 pos = 0;
> int first = 0;
> @@ -967,7 +952,6 @@ static int ll_statahead_thread(void *arg)
>
> sai = ll_sai_get(dir);
> sa_thread = &sai->sai_thread;
> - agl_thread = &sai->sai_agl_thread;
> sa_thread->t_pid = current_pid();
> CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n",
> sai, parent);
> @@ -1129,21 +1113,13 @@ static int ll_statahead_thread(void *arg)
> sa_handle_callback(sai);
> }
> out:
> - if (sai->sai_agl_valid) {
> - spin_lock(&lli->lli_agl_lock);
> - thread_set_flags(agl_thread, SVC_STOPPING);
> - spin_unlock(&lli->lli_agl_lock);
> - wake_up(&agl_thread->t_ctl_waitq);
> + if (sai->sai_agl_task) {
> + kthread_stop(sai->sai_agl_task);
>
> CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
> - sai, (unsigned int)agl_thread->t_pid);
> - wait_event_idle(agl_thread->t_ctl_waitq,
> - thread_is_stopped(agl_thread));
> - } else {
> - /* Set agl_thread flags anyway. */
> - thread_set_flags(agl_thread, SVC_STOPPED);
> + sai, (unsigned int)sai->sai_agl_task->pid);
> + sai->sai_agl_task = NULL;
> }
> -
> /*
> * wait for inflight statahead RPCs to finish, and then we can free sai
> * safely because statahead RPC will access sai data
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-09 00:21:42

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 14/17] staging: lustre: change sai_thread to sai_task.


> On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> Rather than allocating a ptlrpc_thread for the
> stat-ahead thread, just use the task_struct provided
> by kthreads directly.
>
> As nothing ever waits for the sai_task, it must call do_exit()
> directly rather than simply return from the function.
> Also it cannot use kthread_should_stop() to know when to stop.
>
> There is one caller which can ask it to stop so we need a simple
> signaling mechanism. I've chosen to set ->sai_task to NULL
> when the thread should finish up. The thread notices this and
> cleans up and exits.
> lli_sa_lock is used to avoid races between waking up the process
> and the process exiting.
>
> Signed-off-by: NeilBrown <[email protected]>

CC'd Fan Yong, who is the author for this code.

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> .../staging/lustre/lustre/llite/llite_internal.h | 2
> drivers/staging/lustre/lustre/llite/statahead.c | 118 +++++++++-----------
> 2 files changed, 54 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
> index 0c2d717fd526..d46bcf71b273 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_internal.h
> +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
> @@ -1070,7 +1070,7 @@ struct ll_statahead_info {
> sai_agl_valid:1,/* AGL is valid for the dir */
> sai_in_readpage:1;/* statahead in readdir() */
> wait_queue_head_t sai_waitq; /* stat-ahead wait queue */
> - struct ptlrpc_thread sai_thread; /* stat-ahead thread */
> + struct task_struct *sai_task; /* stat-ahead thread */
> struct task_struct *sai_agl_task; /* AGL thread */
> struct list_head sai_interim_entries; /* entries which got async
> * stat reply, but not
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 39241b952bf4..155ce3cf6f60 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -267,7 +267,7 @@ sa_kill(struct ll_statahead_info *sai, struct sa_entry *entry)
>
> /* called by scanner after use, sa_entry will be killed */
> static void
> -sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
> +sa_put(struct ll_statahead_info *sai, struct sa_entry *entry, struct ll_inode_info *lli)
> {
> struct sa_entry *tmp, *next;
>
> @@ -295,7 +295,11 @@ sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
> sa_kill(sai, tmp);
> }
>
> - wake_up(&sai->sai_thread.t_ctl_waitq);
> + spin_lock(&lli->lli_sa_lock);
> + if (sai->sai_task)
> + wake_up_process(sai->sai_task);
> + spin_unlock(&lli->lli_sa_lock);
> +
> }
>
> /*
> @@ -403,7 +407,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry)
> sai->sai_max = LL_SA_RPC_MIN;
> sai->sai_index = 1;
> init_waitqueue_head(&sai->sai_waitq);
> - init_waitqueue_head(&sai->sai_thread.t_ctl_waitq);
>
> INIT_LIST_HEAD(&sai->sai_interim_entries);
> INIT_LIST_HEAD(&sai->sai_entries);
> @@ -465,7 +468,7 @@ static void ll_sai_put(struct ll_statahead_info *sai)
> lli->lli_sai = NULL;
> spin_unlock(&lli->lli_sa_lock);
>
> - LASSERT(thread_is_stopped(&sai->sai_thread));
> + LASSERT(sai->sai_task == NULL);
> LASSERT(sai->sai_agl_task == NULL);
> LASSERT(sai->sai_sent == sai->sai_replied);
> LASSERT(!sa_has_callback(sai));
> @@ -646,7 +649,6 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> struct ll_inode_info *lli = ll_i2info(dir);
> struct ll_statahead_info *sai = lli->lli_sai;
> struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata;
> - wait_queue_head_t *waitq = NULL;
> __u64 handle = 0;
>
> if (it_disposition(it, DISP_LOOKUP_NEG))
> @@ -657,7 +659,6 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> * sai should be always valid, no need to refcount
> */
> LASSERT(sai);
> - LASSERT(!thread_is_stopped(&sai->sai_thread));
> LASSERT(entry);
>
> CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
> @@ -681,8 +682,9 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> spin_lock(&lli->lli_sa_lock);
> if (rc) {
> if (__sa_make_ready(sai, entry, rc))
> - waitq = &sai->sai_waitq;
> + wake_up(&sai->sai_waitq);
> } else {
> + int first = 0;
> entry->se_minfo = minfo;
> entry->se_req = ptlrpc_request_addref(req);
> /*
> @@ -693,14 +695,15 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> */
> entry->se_handle = handle;
> if (!sa_has_callback(sai))
> - waitq = &sai->sai_thread.t_ctl_waitq;
> + first = 1;
>
> list_add_tail(&entry->se_list, &sai->sai_interim_entries);
> +
> + if (first && sai->sai_task)
> + wake_up_process(sai->sai_task);
> }
> sai->sai_replied++;
>
> - if (waitq)
> - wake_up(waitq);
> spin_unlock(&lli->lli_sa_lock);
>
> return rc;
> @@ -942,17 +945,13 @@ static int ll_statahead_thread(void *arg)
> struct inode *dir = d_inode(parent);
> struct ll_inode_info *lli = ll_i2info(dir);
> struct ll_sb_info *sbi = ll_i2sbi(dir);
> - struct ll_statahead_info *sai;
> - struct ptlrpc_thread *sa_thread;
> + struct ll_statahead_info *sai = lli->lli_sai;
> struct page *page = NULL;
> __u64 pos = 0;
> int first = 0;
> int rc = 0;
> struct md_op_data *op_data;
>
> - sai = ll_sai_get(dir);
> - sa_thread = &sai->sai_thread;
> - sa_thread->t_pid = current_pid();
> CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n",
> sai, parent);
>
> @@ -965,21 +964,7 @@ static int ll_statahead_thread(void *arg)
>
> op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
>
> - if (sbi->ll_flags & LL_SBI_AGL_ENABLED)
> - ll_start_agl(parent, sai);
> -
> - atomic_inc(&sbi->ll_sa_total);
> - spin_lock(&lli->lli_sa_lock);
> - if (thread_is_init(sa_thread))
> - /* If someone else has changed the thread state
> - * (e.g. already changed to SVC_STOPPING), we can't just
> - * blindly overwrite that setting.
> - */
> - thread_set_flags(sa_thread, SVC_RUNNING);
> - spin_unlock(&lli->lli_sa_lock);
> - wake_up(&sa_thread->t_ctl_waitq);
> -
> - while (pos != MDS_DIR_END_OFF && thread_is_running(sa_thread)) {
> + while (pos != MDS_DIR_END_OFF && sai->sai_task) {
> struct lu_dirpage *dp;
> struct lu_dirent *ent;
>
> @@ -996,7 +981,7 @@ static int ll_statahead_thread(void *arg)
>
> dp = page_address(page);
> for (ent = lu_dirent_start(dp);
> - ent && thread_is_running(sa_thread) && !sa_low_hit(sai);
> + ent && sai->sai_task && !sa_low_hit(sai);
> ent = lu_dirent_next(ent)) {
> struct lu_fid fid;
> __u64 hash;
> @@ -1046,13 +1031,7 @@ static int ll_statahead_thread(void *arg)
>
> fid_le_to_cpu(&fid, &ent->lde_fid);
>
> - /* wait for spare statahead window */
> do {
> - wait_event_idle(sa_thread->t_ctl_waitq,
> - !sa_sent_full(sai) ||
> - sa_has_callback(sai) ||
> - !list_empty(&sai->sai_agls) ||
> - !thread_is_running(sa_thread));
> sa_handle_callback(sai);
>
> spin_lock(&lli->lli_agl_lock);
> @@ -1072,8 +1051,16 @@ static int ll_statahead_thread(void *arg)
> spin_lock(&lli->lli_agl_lock);
> }
> spin_unlock(&lli->lli_agl_lock);
> - } while (sa_sent_full(sai) &&
> - thread_is_running(sa_thread));
> +
> + set_current_state(TASK_IDLE);
> + if (sa_sent_full(sai) &&
> + !sa_has_callback(sai) &&
> + agl_list_empty(sai) &&
> + sai->sai_task)
> + /* wait for spare statahead window */
> + schedule();
> + __set_current_state(TASK_RUNNING);
> + } while (sa_sent_full(sai) && sai->sai_task);
>
> sa_statahead(parent, name, namelen, &fid);
> }
> @@ -1096,7 +1083,7 @@ static int ll_statahead_thread(void *arg)
>
> if (rc < 0) {
> spin_lock(&lli->lli_sa_lock);
> - thread_set_flags(sa_thread, SVC_STOPPING);
> + sai->sai_task = NULL;
> lli->lli_sa_enabled = 0;
> spin_unlock(&lli->lli_sa_lock);
> }
> @@ -1105,12 +1092,14 @@ static int ll_statahead_thread(void *arg)
> * statahead is finished, but statahead entries need to be cached, wait
> * for file release to stop me.
> */
> - while (thread_is_running(sa_thread)) {
> - wait_event_idle(sa_thread->t_ctl_waitq,
> - sa_has_callback(sai) ||
> - !thread_is_running(sa_thread));
> -
> + while (sai->sai_task) {
> sa_handle_callback(sai);
> +
> + set_current_state(TASK_IDLE);
> + if (!sa_has_callback(sai) &&
> + sai->sai_task)
> + schedule();
> + __set_current_state(TASK_RUNNING);
> }
> out:
> if (sai->sai_agl_task) {
> @@ -1126,26 +1115,23 @@ static int ll_statahead_thread(void *arg)
> */
> while (sai->sai_sent != sai->sai_replied) {
> /* in case we're not woken up, timeout wait */
> - wait_event_idle_timeout(sa_thread->t_ctl_waitq,
> - sai->sai_sent == sai->sai_replied,
> - HZ>>3);
> + schedule_timeout_idle(HZ>>3);
> }
>
> /* release resources held by statahead RPCs */
> sa_handle_callback(sai);
>
> - spin_lock(&lli->lli_sa_lock);
> - thread_set_flags(sa_thread, SVC_STOPPED);
> - spin_unlock(&lli->lli_sa_lock);
> -
> CDEBUG(D_READA, "statahead thread stopped: sai %p, parent %pd\n",
> sai, parent);
>
> + spin_lock(&lli->lli_sa_lock);
> + sai->sai_task = NULL;
> + spin_unlock(&lli->lli_sa_lock);
> +
> wake_up(&sai->sai_waitq);
> - wake_up(&sa_thread->t_ctl_waitq);
> ll_sai_put(sai);
>
> - return rc;
> + do_exit(rc);
> }
>
> /* authorize opened dir handle @key to statahead */
> @@ -1187,13 +1173,13 @@ void ll_deauthorize_statahead(struct inode *dir, void *key)
> lli->lli_opendir_pid = 0;
> lli->lli_sa_enabled = 0;
> sai = lli->lli_sai;
> - if (sai && thread_is_running(&sai->sai_thread)) {
> + if (sai && sai->sai_task) {
> /*
> * statahead thread may not quit yet because it needs to cache
> * entries, now it's time to tell it to quit.
> */
> - thread_set_flags(&sai->sai_thread, SVC_STOPPING);
> - wake_up(&sai->sai_thread.t_ctl_waitq);
> + wake_up_process(sai->sai_task);
> + sai->sai_task = NULL;
> }
> spin_unlock(&lli->lli_sa_lock);
> }
> @@ -1463,7 +1449,7 @@ static int revalidate_statahead_dentry(struct inode *dir,
> */
> ldd = ll_d2d(*dentryp);
> ldd->lld_sa_generation = lli->lli_sa_generation;
> - sa_put(sai, entry);
> + sa_put(sai, entry, lli);
> return rc;
> }
>
> @@ -1483,7 +1469,6 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry)
> {
> struct ll_inode_info *lli = ll_i2info(dir);
> struct ll_statahead_info *sai = NULL;
> - struct ptlrpc_thread *thread;
> struct task_struct *task;
> struct dentry *parent = dentry->d_parent;
> int rc;
> @@ -1523,18 +1508,21 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry)
> CDEBUG(D_READA, "start statahead thread: [pid %d] [parent %pd]\n",
> current_pid(), parent);
>
> - task = kthread_run(ll_statahead_thread, parent, "ll_sa_%u",
> - lli->lli_opendir_pid);
> - thread = &sai->sai_thread;
> + task = kthread_create(ll_statahead_thread, parent, "ll_sa_%u",
> + lli->lli_opendir_pid);
> if (IS_ERR(task)) {
> rc = PTR_ERR(task);
> CERROR("can't start ll_sa thread, rc : %d\n", rc);
> goto out;
> }
>
> - wait_event_idle(thread->t_ctl_waitq,
> - thread_is_running(thread) || thread_is_stopped(thread));
> - ll_sai_put(sai);
> + if (ll_i2sbi(parent->d_inode)->ll_flags & LL_SBI_AGL_ENABLED)
> + ll_start_agl(parent, sai);
> +
> + atomic_inc(&ll_i2sbi(parent->d_inode)->ll_sa_total);
> + sai->sai_task = task;
> +
> + wake_up_process(task);
>
> /*
> * We don't stat-ahead for the first dirent since we are already in
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-09 00:33:00

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 15/17] staging: lustre: ptlrpc: move thread creation out of module initialization

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> When the ptlrpc module is loaded, it starts the pinger thread and
> calls LNetNIInit which starts various threads.
>
> We don't need these threads until the module is actually being
> used, such as when a lustre filesystem is mounted.
>
> So move the thread creation into new ptlrpc_inc_ref() (modeled on
> ptlrpcd_inc_ref()), and call that when needed, such as at mount time.

It looks like this is still done early enough in the mount sequence, so the
earlier "[06/17] get entropy from nid when nid set" action is still done
before the client UUID is generated in ll_init_sbi().

Reviewed-by: Andreas Dilger <[email protected]>

> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lustre/include/lustre_net.h | 3 +
> drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 12 ++++
> drivers/staging/lustre/lustre/llite/llite_lib.c | 18 +++++-
> .../staging/lustre/lustre/ptlrpc/ptlrpc_module.c | 56 +++++++++++++-------
> 4 files changed, 65 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
> index 108683c54127..d35ae0cda8d2 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> @@ -1804,6 +1804,9 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd);
> */
> void ptlrpc_request_committed(struct ptlrpc_request *req, int force);
>
> +int ptlrpc_inc_ref(void);
> +void ptlrpc_dec_ref(void);
> +
> void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
> struct ptlrpc_client *);
> struct ptlrpc_connection *ptlrpc_uuid_to_connection(struct obd_uuid *uuid);
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> index 58913e628124..c772c68e5a49 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> @@ -869,6 +869,10 @@ int ldlm_get_ref(void)
> {
> int rc = 0;
>
> + rc = ptlrpc_inc_ref();
> + if (rc)
> + return rc;
> +
> mutex_lock(&ldlm_ref_mutex);
> if (++ldlm_refcount == 1) {
> rc = ldlm_setup();
> @@ -877,14 +881,18 @@ int ldlm_get_ref(void)
> }
> mutex_unlock(&ldlm_ref_mutex);
>
> + if (rc)
> + ptlrpc_dec_ref();
> +
> return rc;
> }
>
> void ldlm_put_ref(void)
> {
> + int rc = 0;
> mutex_lock(&ldlm_ref_mutex);
> if (ldlm_refcount == 1) {
> - int rc = ldlm_cleanup();
> + rc = ldlm_cleanup();
>
> if (rc)
> CERROR("ldlm_cleanup failed: %d\n", rc);
> @@ -894,6 +902,8 @@ void ldlm_put_ref(void)
> ldlm_refcount--;
> }
> mutex_unlock(&ldlm_ref_mutex);
> + if (!rc)
> + ptlrpc_dec_ref();
> }
>
> static ssize_t cancel_unused_locks_before_replay_show(struct kobject *kobj,
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index 844182ad7dd7..706b14bf8981 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -879,9 +879,15 @@ int ll_fill_super(struct super_block *sb)
>
> CDEBUG(D_VFSTRACE, "VFS Op: sb %p\n", sb);
>
> + err = ptlrpc_inc_ref();
> + if (err)
> + return err;
> +
> cfg = kzalloc(sizeof(*cfg), GFP_NOFS);
> - if (!cfg)
> - return -ENOMEM;
> + if (!cfg) {
> + err = -ENOMEM;
> + goto out_put;
> + }
>
> try_module_get(THIS_MODULE);
>
> @@ -891,7 +897,8 @@ int ll_fill_super(struct super_block *sb)
> if (!sbi) {
> module_put(THIS_MODULE);
> kfree(cfg);
> - return -ENOMEM;
> + err = -ENOMEM;
> + goto out_put;
> }
>
> err = ll_options(lsi->lsi_lmd->lmd_opts, &sbi->ll_flags);
> @@ -958,6 +965,9 @@ int ll_fill_super(struct super_block *sb)
> LCONSOLE_WARN("Mounted %s\n", profilenm);
>
> kfree(cfg);
> +out_put:
> + if (err)
> + ptlrpc_dec_ref();
> return err;
> } /* ll_fill_super */
>
> @@ -1028,6 +1038,8 @@ void ll_put_super(struct super_block *sb)
> cl_env_cache_purge(~0);
>
> module_put(THIS_MODULE);
> +
> + ptlrpc_dec_ref();
> } /* client_put_super */
>
> struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> index 131fc6d9646e..38923418669f 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> @@ -45,6 +45,42 @@ extern spinlock_t ptlrpc_last_xid_lock;
> extern spinlock_t ptlrpc_rs_debug_lock;
> #endif
>
> +DEFINE_MUTEX(ptlrpc_startup);
> +static int ptlrpc_active = 0;
> +
> +int ptlrpc_inc_ref(void)
> +{
> + int rc = 0;
> +
> + mutex_lock(&ptlrpc_startup);
> + if (ptlrpc_active++ == 0) {
> + ptlrpc_put_connection_superhack = ptlrpc_connection_put;
> +
> + rc = ptlrpc_init_portals();
> + if (!rc) {
> + rc= ptlrpc_start_pinger();
> + if (rc)
> + ptlrpc_exit_portals();
> + }
> + if (rc)
> + ptlrpc_active--;
> + }
> + mutex_unlock(&ptlrpc_startup);
> + return rc;
> +}
> +EXPORT_SYMBOL(ptlrpc_inc_ref);
> +
> +void ptlrpc_dec_ref(void)
> +{
> + mutex_lock(&ptlrpc_startup);
> + if (--ptlrpc_active == 0) {
> + ptlrpc_stop_pinger();
> + ptlrpc_exit_portals();
> + }
> + mutex_unlock(&ptlrpc_startup);
> +}
> +EXPORT_SYMBOL(ptlrpc_dec_ref);
> +
> static int __init ptlrpc_init(void)
> {
> int rc, cleanup_phase = 0;
> @@ -71,24 +107,12 @@ static int __init ptlrpc_init(void)
> if (rc)
> goto cleanup;
>
> - cleanup_phase = 2;
> - rc = ptlrpc_init_portals();
> - if (rc)
> - goto cleanup;
> -
> cleanup_phase = 3;
>
> rc = ptlrpc_connection_init();
> if (rc)
> goto cleanup;
>
> - cleanup_phase = 4;
> - ptlrpc_put_connection_superhack = ptlrpc_connection_put;
> -
> - rc = ptlrpc_start_pinger();
> - if (rc)
> - goto cleanup;
> -
> cleanup_phase = 5;
> rc = ldlm_init();
> if (rc)
> @@ -122,15 +146,9 @@ static int __init ptlrpc_init(void)
> ldlm_exit();
> /* Fall through */
> case 5:
> - ptlrpc_stop_pinger();
> - /* Fall through */
> - case 4:
> ptlrpc_connection_fini();
> /* Fall through */
> case 3:
> - ptlrpc_exit_portals();
> - /* Fall through */
> - case 2:
> ptlrpc_request_cache_fini();
> /* Fall through */
> case 1:
> @@ -150,8 +168,6 @@ static void __exit ptlrpc_exit(void)
> ptlrpc_nrs_fini();
> sptlrpc_fini();
> ldlm_exit();
> - ptlrpc_stop_pinger();
> - ptlrpc_exit_portals();
> ptlrpc_request_cache_fini();
> ptlrpc_hr_fini();
> ptlrpc_connection_fini();
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-09 00:34:27

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 16/17] staging: lustre: allow monolithic builds

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> Remove restriction the lustre must be built
> as modules. It now works as a monolithic build.
>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> drivers/staging/lustre/lnet/Kconfig | 2 +-
> drivers/staging/lustre/lustre/Kconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/lustre/lnet/Kconfig b/drivers/staging/lustre/lnet/Kconfig
> index 6bcb53d0c6f4..ad049e6f24e4 100644
> --- a/drivers/staging/lustre/lnet/Kconfig
> +++ b/drivers/staging/lustre/lnet/Kconfig
> @@ -1,6 +1,6 @@
> config LNET
> tristate "Lustre networking subsystem (LNet)"
> - depends on INET && m
> + depends on INET
> help
> The Lustre network layer, also known as LNet, is a networking abstaction
> level API that was initially created to allow Lustre Filesystem to utilize
> diff --git a/drivers/staging/lustre/lustre/Kconfig b/drivers/staging/lustre/lustre/Kconfig
> index 90d826946c6a..c669c8fa0cc6 100644
> --- a/drivers/staging/lustre/lustre/Kconfig
> +++ b/drivers/staging/lustre/lustre/Kconfig
> @@ -1,6 +1,6 @@
> config LUSTRE_FS
> tristate "Lustre file system client support"
> - depends on m && !MIPS && !XTENSA && !SUPERH
> + depends on !MIPS && !XTENSA && !SUPERH
> depends on LNET
> select CRYPTO
> select CRYPTO_CRC32
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-09 00:38:33

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 17/17] Revert "staging: Disable lustre file system for MIPS, SH, and XTENSA"

On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>
> This reverts commit 16f1eeb660bd2bfd223704ee6350706b39c55a7a.
>
> The reason for this patch was that lustre used copy_from_user_page.
> Commit 76133e66b141 ("staging/lustre: Replace jobid acquiring with per
> node setting") removed that usage.
> So the arch restrictions can go.

I don't think these platforms will be used with Lustre any time soon,
but who knows... :-)

In any case, thanks for this patch series, definitely a lot of good
cleanups in there.

Reviewed-by: Andreas Dilger <[email protected]>

> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/staging/lustre/lustre/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/lustre/Kconfig b/drivers/staging/lustre/lustre/Kconfig
> index c669c8fa0cc6..ccb78a945995 100644
> --- a/drivers/staging/lustre/lustre/Kconfig
> +++ b/drivers/staging/lustre/lustre/Kconfig
> @@ -1,6 +1,5 @@
> config LUSTRE_FS
> tristate "Lustre file system client support"
> - depends on !MIPS && !XTENSA && !SUPERH
> depends on LNET
> select CRYPTO
> select CRYPTO_CRC32
>
>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








2018-03-11 21:39:08

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 11/17] staging: lustre: ptlrpc: use workqueue for pinger

On Thu, Mar 08 2018, Dilger, Andreas wrote:

> On Mar 1, 2018, at 16:31, NeilBrown <[email protected]> wrote:
>>
>> lustre has a "Pinger" kthread which periodically pings peers
>> to ensure all hosts are functioning.
>>
>> This can more easily be done using a work queue.
>>
>> As maintaining contact with other peers is import for
>> keeping the filesystem running, and as the filesystem might
>> be involved in freeing memory, it is safest to have a
>> separate WQ_MEM_RECLAIM workqueue.
>>
>> The SVC_EVENT functionality to wake up the thread can be
>> replaced with mod_delayed_work().
>>
>> Also use round_jiffies_up_relative() rather than setting a
>> minimum of 1 second delay. The PING_INTERVAL is measured in
>> seconds so this meets the need is allow the workqueue to
>> keep wakeups synchronized.
>>
>> Signed-off-by: NeilBrown <[email protected]>
>
> Looks reasonable. Fortunately, pinging the server does not need
> to be very accurate since it is only done occasionally when the
> client is otherwise idle, so it shouldn't matter if the workqueue
> operation is delayed by a few seconds.
>
> Reviewed-by: Andreas Dilger <[email protected]>

Thanks a lot for the thorough review!
The above implies that we don't need WQ_MEM_RECLAIM. I didn't dig in to
exactly when and why pings happened so I thought having WQ_MEM_RECLAIM
we safest. If pings only happen when the client is otherwise idle, the
it isn't needed. I'll remove it.

Thanks,
NeilBrown


>
>> ---
>> drivers/staging/lustre/lustre/ptlrpc/pinger.c | 81 +++++++------------------
>> 1 file changed, 24 insertions(+), 57 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>> index b5f3cfee8e75..0775b7a048bb 100644
>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>> @@ -217,21 +217,18 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
>> }
>> }
>>
>> -static int ptlrpc_pinger_main(void *arg)
>> -{
>> - struct ptlrpc_thread *thread = arg;
>> -
>> - /* Record that the thread is running */
>> - thread_set_flags(thread, SVC_RUNNING);
>> - wake_up(&thread->t_ctl_waitq);
>> +static struct workqueue_struct *pinger_wq;
>> +static void ptlrpc_pinger_main(struct work_struct *ws);
>> +static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>
>> - /* And now, loop forever, pinging as needed. */
>> - while (1) {
>> - unsigned long this_ping = cfs_time_current();
>> - long time_to_next_wake;
>> - struct timeout_item *item;
>> - struct obd_import *imp;
>> +static void ptlrpc_pinger_main(struct work_struct *ws)
>> +{
>> + unsigned long this_ping = cfs_time_current();
>> + long time_to_next_wake;
>> + struct timeout_item *item;
>> + struct obd_import *imp;
>>
>> + do {
>> mutex_lock(&pinger_mutex);
>> list_for_each_entry(item, &timeout_list, ti_chain) {
>> item->ti_cb(item, item->ti_cb_data);
>> @@ -260,50 +257,24 @@ static int ptlrpc_pinger_main(void *arg)
>> time_to_next_wake,
>> cfs_time_add(this_ping,
>> PING_INTERVAL * HZ));
>> - if (time_to_next_wake > 0) {
>> - wait_event_idle_timeout(thread->t_ctl_waitq,
>> - thread_is_stopping(thread) ||
>> - thread_is_event(thread),
>> - max_t(long, time_to_next_wake, HZ));
>> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
>> - break;
>> - /* woken after adding import to reset timer */
>> - thread_test_and_clear_flags(thread, SVC_EVENT);
>> - }
>> - }
>> + } while (time_to_next_wake <= 0);
>>
>> - thread_set_flags(thread, SVC_STOPPED);
>> - wake_up(&thread->t_ctl_waitq);
>> -
>> - CDEBUG(D_NET, "pinger thread exiting, process %d\n", current_pid());
>> - return 0;
>> + queue_delayed_work(pinger_wq, &ping_work,
>> + round_jiffies_up_relative(time_to_next_wake));
>> }
>>
>> -static struct ptlrpc_thread pinger_thread;
>> -
>> int ptlrpc_start_pinger(void)
>> {
>> - struct task_struct *task;
>> - int rc;
>> -
>> - if (!thread_is_init(&pinger_thread) &&
>> - !thread_is_stopped(&pinger_thread))
>> + if (pinger_wq)
>> return -EALREADY;
>>
>> - init_waitqueue_head(&pinger_thread.t_ctl_waitq);
>> -
>> - strcpy(pinger_thread.t_name, "ll_ping");
>> -
>> - task = kthread_run(ptlrpc_pinger_main, &pinger_thread,
>> - pinger_thread.t_name);
>> - if (IS_ERR(task)) {
>> - rc = PTR_ERR(task);
>> - CERROR("cannot start pinger thread: rc = %d\n", rc);
>> - return rc;
>> + pinger_wq = alloc_workqueue("ptlrpc_pinger", WQ_MEM_RECLAIM, 1);
>> + if (!pinger_wq) {
>> + CERROR("cannot start pinger workqueue\n");
>> + return -ENOMEM;
>> }
>> - wait_event_idle(pinger_thread.t_ctl_waitq,
>> - thread_is_running(&pinger_thread));
>>
>> + queue_delayed_work(pinger_wq, &ping_work, 0);
>> return 0;
>> }
>>
>> @@ -313,16 +284,13 @@ int ptlrpc_stop_pinger(void)
>> {
>> int rc = 0;
>>
>> - if (thread_is_init(&pinger_thread) ||
>> - thread_is_stopped(&pinger_thread))
>> + if (!pinger_wq)
>> return -EALREADY;
>>
>> ptlrpc_pinger_remove_timeouts();
>> - thread_set_flags(&pinger_thread, SVC_STOPPING);
>> - wake_up(&pinger_thread.t_ctl_waitq);
>> -
>> - wait_event_idle(pinger_thread.t_ctl_waitq,
>> - thread_is_stopped(&pinger_thread));
>> + cancel_delayed_work_sync(&ping_work);
>> + destroy_workqueue(pinger_wq);
>> + pinger_wq = NULL;
>>
>> return rc;
>> }
>> @@ -505,6 +473,5 @@ static int ptlrpc_pinger_remove_timeouts(void)
>>
>> void ptlrpc_pinger_wake_up(void)
>> {
>> - thread_add_flags(&pinger_thread, SVC_EVENT);
>> - wake_up(&pinger_thread.t_ctl_waitq);
>> + mod_delayed_work(pinger_wq, &ping_work, 0);
>> }
>>
>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation


Attachments:
signature.asc (847.00 B)