My intention is to make it easier to manipulate and maintain kthreads.
Especially, I want to replace all the custom main cycles with a
generic one. Also I want to make the kthreads sleep in a consistent
state in a common place when there is no work.
My first attempt was with a brand new API (iterant kthread), see
http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
directed to improve the existing kthread worker API. This is
the 3rd iteration of the new direction.
1st patch: add support to check if a timer callback is being called
2nd..12th patches: improve the existing kthread worker API
13th..18th, 20th, 22nd patches: convert several kthreads into
the kthread worker API, namely: khugepaged, ring buffer
benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
memstick/r592, intel_powerclamp
21st, 23rd patches: do some preparation steps; they usually do
some clean up that makes sense even without the conversion.
Changes against v2:
+ used worker->lock to synchronize the operations with the work
instead of the PENDING bit as suggested by Tejun Heo; it simplified
the implementation in several ways
+ added timer_active(); used it together with del_timer_sync()
to cancel the work a less tricky way
+ removed the controversial conversion of the RCU kthreads
+ added several other examples: hung_task, kmemleak, ipmi,
IB/fmr_pool, memstick/r592, intel_powerclamp
+ the helper fixes for the ring buffer benchmark has been improved
as suggested by Steven; they already are in the Linus tree now
+ fixed a possible race between the check for existing khugepaged
worker and queuing the work
Changes against v1:
+ remove wrappers to manipulate the scheduling policy and priority
+ remove questionable wakeup_and_destroy_kthread_worker() variant
+ do not check for chained work when draining the queue
+ allocate struct kthread worker in create_kthread_work() and
use more simple checks for running worker
+ add support for delayed kthread works and use them instead
of waiting inside the works
+ rework the "unrelated" fixes for the ring buffer benchmark
as discussed in the 1st RFC; also sent separately
+ convert also the consumer in the ring buffer benchmark
I have tested this patch set against the stable Linus tree
for 4.4-rc1.
Petr Mladek (22):
timer: Allow to check when the timer callback has not finished yet
kthread/smpboot: Do not park in kthread_create_on_cpu()
kthread: Allow to call __kthread_create_on_node() with va_list args
kthread: Add create_kthread_worker*()
kthread: Add drain_kthread_worker()
kthread: Add destroy_kthread_worker()
kthread: Detect when a kthread work is used by more workers
kthread: Initial support for delayed kthread work
kthread: Allow to cancel kthread work
kthread: Allow to modify delayed kthread work
kthread: Better support freezable kthread workers
kthread: Use try_lock_kthread_work() in flush_kthread_work()
mm/huge_page: Convert khugepaged() into kthread worker API
ring_buffer: Convert benchmark kthreads into kthread worker API
hung_task: Convert hungtaskd into kthread worker API
kmemleak: Convert kmemleak kthread into kthread worker API
ipmi: Convert kipmi kthread into kthread worker API
IB/fmr_pool: Convert the cleanup thread into kthread worker API
memstick/r592: Better synchronize debug messages in r592_io kthread
memstick/r592: convert r592_io kthread into kthread worker API
thermal/intel_powerclamp: Remove duplicated code that starts the
kthread
thermal/intel_powerclamp: Convert the kthread to kthread worker API
drivers/char/ipmi/ipmi_si_intf.c | 116 ++++---
drivers/infiniband/core/fmr_pool.c | 54 ++-
drivers/memstick/host/r592.c | 61 ++--
drivers/memstick/host/r592.h | 5 +-
drivers/thermal/intel_powerclamp.c | 302 +++++++++--------
include/linux/kthread.h | 56 ++++
include/linux/timer.h | 2 +
kernel/hung_task.c | 41 ++-
kernel/kthread.c | 618 +++++++++++++++++++++++++++++++----
kernel/smpboot.c | 5 +
kernel/time/timer.c | 24 ++
kernel/trace/ring_buffer_benchmark.c | 133 ++++----
mm/huge_memory.c | 134 ++++----
mm/kmemleak.c | 86 +++--
14 files changed, 1142 insertions(+), 495 deletions(-)
CC: Catalin Marinas <[email protected]>
CC: [email protected]
CC: Corey Minyard <[email protected]>
CC: [email protected]
CC: Doug Ledford <[email protected]>
CC: Sean Hefty <[email protected]>
CC: Hal Rosenstock <[email protected]>
CC: [email protected]
CC: Maxim Levitsky <[email protected]>
CC: Zhang Rui <[email protected]>
CC: Eduardo Valentin <[email protected]>
CC: Jacob Pan <[email protected]>
CC: [email protected]
--
1.8.5.6
timer_pending() checks whether the list of callbacks is empty.
Each callback is removed from the list before it is called,
see call_timer_fn() in __run_timers().
Sometimes we need to make sure that the callback has finished.
For example, if we want to free some resources that are accessed
by the callback.
For this purpose, this patch adds timer_active(). It checks both
the list of callbacks and the running_timer. It takes the base_lock
to see a consistent state.
I plan to use it to implement delayed works in kthread worker.
But I guess that it will have wider use. In fact, I wonder if
timer_pending() is misused in some situations.
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/timer.h | 2 ++
kernel/time/timer.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 61aa61dc410c..237b7c3e2b4e 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -165,6 +165,8 @@ static inline int timer_pending(const struct timer_list * timer)
return timer->entry.pprev != NULL;
}
+extern int timer_active(struct timer_list *timer);
+
extern void add_timer_on(struct timer_list *timer, int cpu);
extern int del_timer(struct timer_list * timer);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index bbc5d1114583..1c16f3230771 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -778,6 +778,30 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
}
}
+/**
+ * timer_active - is a timer still in use?
+ * @timer: the timer in question
+ *
+ * timer_in_use() will tell whether the timer is pending or if the callback
+ * is curretly running.
+ *
+ * Use this function if you want to make sure that some resources
+ * will not longer get accessed by the timer callback. timer_pending()
+ * is not safe in this case.
+ */
+int timer_active(struct timer_list *timer)
+{
+ struct tvec_base *base;
+ unsigned long flags;
+ int ret;
+
+ base = lock_timer_base(timer, &flags);
+ ret = timer_pending(timer) || base->running_timer == timer;
+ spin_unlock_irqrestore(&base->lock, flags);
+
+ return ret;
+}
+
static inline int
__mod_timer(struct timer_list *timer, unsigned long expires,
bool pending_only, int pinned)
--
1.8.5.6
kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
("kthread: Implement park/unpark facility"). It is currently used
only when enabling new CPU. For this purpose, the newly created
kthread has to be parked.
The CPU binding is a bit tricky. The kthread is parked when the CPU
has not been allowed yet. And the CPU is bound when the kthread
is unparked.
The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread. For this purpose, the newly created
kthread should stay in the uninterruptible state.
This patch moves the parking into smpboot. It binds the thread
already when created. Then the function might be used universally.
Also the behavior is consistent with kthread_create() and
kthread_create_on_node().
Signed-off-by: Petr Mladek <[email protected]>
---
kernel/kthread.c | 8 ++++++--
kernel/smpboot.c | 5 +++++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173dca1ae..1ffc11ec5546 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -390,10 +390,10 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
cpu);
if (IS_ERR(p))
return p;
+ kthread_bind(p, cpu);
+ /* CPU hotplug need to bind once again when unparking the thread. */
set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
to_kthread(p)->cpu = cpu;
- /* Park the thread to get it out of TASK_UNINTERRUPTIBLE state */
- kthread_park(p);
return p;
}
@@ -407,6 +407,10 @@ static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
* which might be about to be cleared.
*/
if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+ /*
+ * Newly created kthread was parked when the CPU was offline.
+ * The binding was lost and we need to set it again.
+ */
if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
__kthread_bind(k, kthread->cpu, TASK_PARKED);
wake_up_state(k, TASK_PARKED);
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index d264f59bff56..79f07014be6e 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -186,6 +186,11 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
kfree(td);
return PTR_ERR(tsk);
}
+ /*
+ * Park the thread so that it could start right on the CPU
+ * when it is available.
+ */
+ kthread_park(tsk);
get_task_struct(tsk);
*per_cpu_ptr(ht->store, cpu) = tsk;
if (ht->create) {
--
1.8.5.6
kthread_create_on_node() implements a bunch of logic to create
the kthread. It is already called by kthread_create_on_cpu().
We are going to extend the kthread worker API and will
need to call kthread_create_on_node() with va_list args there.
This patch does only a refactoring and does not modify the existing
behavior.
Signed-off-by: Petr Mladek <[email protected]>
---
kernel/kthread.c | 72 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 42 insertions(+), 30 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1ffc11ec5546..bfe8742c4217 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -244,33 +244,10 @@ static void create_kthread(struct kthread_create_info *create)
}
}
-/**
- * kthread_create_on_node - create a kthread.
- * @threadfn: the function to run until signal_pending(current).
- * @data: data ptr for @threadfn.
- * @node: task and thread structures for the thread are allocated on this node
- * @namefmt: printf-style name for the thread.
- *
- * Description: This helper function creates and names a kernel
- * thread. The thread will be stopped: use wake_up_process() to start
- * it. See also kthread_run(). The new thread has SCHED_NORMAL policy and
- * is affine to all CPUs.
- *
- * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
- * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
- * standalone thread for which no one will call kthread_stop(), or
- * return when 'kthread_should_stop()' is true (which means
- * kthread_stop() has been called). The return value should be zero
- * or a negative error number; it will be passed to kthread_stop().
- *
- * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
- */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
- void *data, int node,
- const char namefmt[],
- ...)
+static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
+ void *data, int node,
+ const char namefmt[],
+ va_list args)
{
DECLARE_COMPLETION_ONSTACK(done);
struct task_struct *task;
@@ -311,11 +288,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
task = create->result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
- va_list args;
- va_start(args, namefmt);
vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
- va_end(args);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
* The kernel thread should not inherit these properties.
@@ -326,6 +300,44 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
kfree(create);
return task;
}
+
+/**
+ * kthread_create_on_node - create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: task and thread structures for the thread are allocated on this node
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread. The thread will be stopped: use wake_up_process() to start
+ * it. See also kthread_run(). The new thread has SCHED_NORMAL policy and
+ * is affine to all CPUs.
+ *
+ * If thread is going to be bound on a particular cpu, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn() can either call do_exit() directly if it is a
+ * standalone thread for which no one will call kthread_stop(), or
+ * return when 'kthread_should_stop()' is true (which means
+ * kthread_stop() has been called). The return value should be zero
+ * or a negative error number; it will be passed to kthread_stop().
+ *
+ * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
+ */
+struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
+ void *data, int node,
+ const char namefmt[],
+ ...)
+{
+ struct task_struct *task;
+ va_list args;
+
+ va_start(args, namefmt);
+ task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+ va_end(args);
+
+ return task;
+}
EXPORT_SYMBOL(kthread_create_on_node);
static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, long state)
--
1.8.5.6
Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.
This patch defines create_kthread_worker() and
create_kthread_worker_on_cpu() functions that hide implementation details.
They enforce using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think
that we do not want any alternative main thread because it would be
hard to support consistency with the rest of the kthread worker API.
The naming and function is inspired by the workqueues API like the rest
of the kthread worker API.
Note that we need to bind per-CPU kthread workers already when they are
created. It makes the life easier. kthread_bind() could not be used later
for an already running worker.
This patch does _not_ convert existing kthread workers. The kthread worker
API need more improvements first, e.g. a function to destroy the worker.
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/kthread.h | 7 ++++
kernel/kthread.c | 99 ++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 96 insertions(+), 10 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index e691b6a23f72..943900c7ce35 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -124,6 +124,13 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
int kthread_worker_fn(void *worker_ptr);
+__printf(1, 2)
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...);
+
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
+
bool queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work);
void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfe8742c4217..df402e18bb5a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,23 +567,24 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
* kthread_worker_fn - kthread function to process kthread_worker
* @worker_ptr: pointer to initialized kthread_worker
*
- * This function can be used as @threadfn to kthread_create() or
- * kthread_run() with @worker_ptr argument pointing to an initialized
- * kthread_worker. The started kthread will process work_list until
- * the it is stopped with kthread_stop(). A kthread can also call
- * this function directly after extra initialization.
+ * This function implements the main cycle of kthread worker. It processes
+ * work_list until it is stopped with kthread_stop(). It sleeps when the queue
+ * is empty.
*
- * Different kthreads can be used for the same kthread_worker as long
- * as there's only one kthread attached to it at any given time. A
- * kthread_worker without an attached kthread simply collects queued
- * kthread_works.
+ * The works are not allowed to keep any locks, disable preemption or interrupts
+ * when they finish. There is defined a safe point for freezing when one work
+ * finishes and before a new one is started.
*/
int kthread_worker_fn(void *worker_ptr)
{
struct kthread_worker *worker = worker_ptr;
struct kthread_work *work;
- WARN_ON(worker->task);
+ /*
+ * FIXME: Update the check and remove the assignment when all kthread
+ * worker users are created using create_kthread_worker*() functions.
+ */
+ WARN_ON(worker->task && worker->task != current);
worker->task = current;
repeat:
set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
@@ -617,6 +618,84 @@ repeat:
}
EXPORT_SYMBOL_GPL(kthread_worker_fn);
+static struct kthread_worker *
+__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+{
+ struct kthread_worker *worker;
+ struct task_struct *task;
+
+ worker = kzalloc(sizeof(*worker), GFP_KERNEL);
+ if (!worker)
+ return ERR_PTR(-ENOMEM);
+
+ init_kthread_worker(worker);
+
+ if (cpu >= 0)
+ task = kthread_create_on_cpu(kthread_worker_fn, worker,
+ cpu, namefmt);
+ else
+ task = __kthread_create_on_node(kthread_worker_fn, worker,
+ -1, namefmt, args);
+ if (IS_ERR(task))
+ goto fail_task;
+
+ worker->task = task;
+ wake_up_process(task);
+ return worker;
+
+fail_task:
+ kfree(worker);
+ return ERR_CAST(task);
+}
+
+/**
+ * create_kthread_worker - create a kthread worker
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Returns pointer to an allocated worker on success, ERR_PTR(-ENOMEM) when
+ * the needed structures could not get allocated, and ERR_PTR(-EINTR) when
+ * the worker was SIGKILLed.
+ */
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...)
+{
+ struct kthread_worker *worker;
+ va_list args;
+
+ va_start(args, namefmt);
+ worker = __create_kthread_worker(-1, namefmt, args);
+ va_end(args);
+
+ return worker;
+}
+EXPORT_SYMBOL(create_kthread_worker);
+
+/**
+ * create_kthread_worker_on_cpu - create a kthread worker and bind it
+ * it to a given CPU and the associated NUMA node.
+ * @cpu: CPU number
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Use a valid CPU number if you want to bind the kthread worker
+ * to the given CPU and the associated NUMA node.
+ *
+ * @namefmt might include one "%d" that will get replaced by CPU number.
+ *
+ * Returns pointer to allocated worker on success, ERR_PTR when the CPU
+ * number is not valid, ERR_PTR(-ENOMEM) when the needed structures could
+ * not get allocated, ERR_PTR(-EINTR) when the worker was SIGKILLed, and
+ * ERR_PTR(-EINVAL) on invalid @cpu.
+ */
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[])
+{
+ if (cpu < 0 || cpu > num_possible_cpus())
+ return ERR_PTR(-EINVAL);
+
+ return __create_kthread_worker(cpu, namefmt, NULL);
+}
+EXPORT_SYMBOL(create_kthread_worker_on_cpu);
+
/* insert @work before @pos in @worker */
static void insert_kthread_work(struct kthread_worker *worker,
struct kthread_work *work,
--
1.8.5.6
flush_kthread_worker() returns when the currently queued works are proceed.
But some other works might have been queued in the meantime.
This patch adds drain_kthread_work() that is inspired by drain_workqueue().
It returns when the queue is completely empty and warns when it takes too
long.
The initial implementation does not block queuing new works when draining.
It makes things much easier. The blocking would be useful to debug
potential problems but it is not clear if it is worth the complication
at the moment.
Signed-off-by: Petr Mladek <[email protected]>
---
kernel/kthread.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index df402e18bb5a..a18ad3b58f61 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -804,3 +804,37 @@ void flush_kthread_worker(struct kthread_worker *worker)
wait_for_completion(&fwork.done);
}
EXPORT_SYMBOL_GPL(flush_kthread_worker);
+
+/**
+ * drain_kthread_worker - drain a kthread worker
+ * @worker: worker to be drained
+ *
+ * Wait until there is no work queued for the given kthread worker.
+ * @worker is flushed repeatedly until it becomes empty. The number
+ * of flushing is determined by the depth of chaining and should
+ * be relatively short. Whine if it takes too long.
+ *
+ * The caller is responsible for blocking all users of this kthread
+ * worker from queuing new works. Also it is responsible for blocking
+ * the already queued works from an infinite re-queuing!
+ */
+void drain_kthread_worker(struct kthread_worker *worker)
+{
+ int flush_cnt = 0;
+
+ spin_lock_irq(&worker->lock);
+
+ while (!list_empty(&worker->work_list)) {
+ spin_unlock_irq(&worker->lock);
+
+ flush_kthread_worker(worker);
+ WARN_ONCE(flush_cnt++ > 10,
+ "kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
+ worker->task->comm, flush_cnt);
+
+ spin_lock_irq(&worker->lock);
+ }
+
+ spin_unlock_irq(&worker->lock);
+}
+EXPORT_SYMBOL(drain_kthread_worker);
--
1.8.5.6
The current kthread worker users call flush() and stop() explicitly.
This function drains the worker, stops it, and frees the kthread_worker
struct in one call.
It is supposed to be used together with create_kthread_worker*() that
allocates struct kthread_worker.
Also note that drain() correctly handles self-queuing works in compare
with flush().
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/kthread.h | 2 ++
kernel/kthread.c | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 943900c7ce35..c4a95a3ba500 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -136,4 +136,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
void flush_kthread_work(struct kthread_work *work);
void flush_kthread_worker(struct kthread_worker *worker);
+void destroy_kthread_worker(struct kthread_worker *worker);
+
#endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a18ad3b58f61..1d41e0faef2d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -838,3 +838,24 @@ void drain_kthread_worker(struct kthread_worker *worker)
spin_unlock_irq(&worker->lock);
}
EXPORT_SYMBOL(drain_kthread_worker);
+
+/**
+ * destroy_kthread_worker - destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Drain and destroy @worker. It has the same conditions
+ * for use as drain_kthread_worker(), see above.
+ */
+void destroy_kthread_worker(struct kthread_worker *worker)
+{
+ struct task_struct *task;
+
+ task = worker->task;
+ if (WARN_ON(!task))
+ return;
+
+ drain_kthread_worker(worker);
+ kthread_stop(task);
+ kfree(worker);
+}
+EXPORT_SYMBOL(destroy_kthread_worker);
--
1.8.5.6
Nothing currently prevents a work from queuing for a kthread worker
when it is already running on another one. This means that the work
might run in parallel on more workers. Also some operations, e.g.
flush or drain are not reliable.
This problem will be even more visible after we add cancel_kthread_work()
function. It will only have "work" as the parameter and will use
worker->lock to synchronize with others.
Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.
This patch adds a warning when we try to insert the work for another
worker. It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.
Note that we need to clear the information about the current worker
when the work is not longer used. It is important when the worker
is destroyed and later created again. For example, this is
useful when a service might get disabled and enabled via sysfs.
Also note that kthread_work_pending() function will get more
complicated once we add support for a delayed kthread work and
allow to cancel works.
Just for completeness, the patch adds a check for disabled interrupts
and an empty queue.
The patch also puts all the checks into a separate function. It will
be reused when implementing delayed works.
Signed-off-by: Petr Mladek <[email protected]>
---
kernel/kthread.c | 45 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1d41e0faef2d..378d2203c8b0 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -563,6 +563,18 @@ void __init_kthread_worker(struct kthread_worker *worker,
}
EXPORT_SYMBOL_GPL(__init_kthread_worker);
+/*
+ * Returns true when there is a pending operation for this work.
+ * In particular, it checks if the work is:
+ * - queued
+ *
+ * This function must be called with locked work.
+ */
+static inline bool kthread_work_pending(const struct kthread_work *work)
+{
+ return !list_empty(&work->node);
+}
+
/**
* kthread_worker_fn - kthread function to process kthread_worker
* @worker_ptr: pointer to initialized kthread_worker
@@ -574,6 +586,9 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
* The works are not allowed to keep any locks, disable preemption or interrupts
* when they finish. There is defined a safe point for freezing when one work
* finishes and before a new one is started.
+ *
+ * Also the works must not be handled by more workers at the same time, see also
+ * queue_kthread_work().
*/
int kthread_worker_fn(void *worker_ptr)
{
@@ -610,6 +625,12 @@ repeat:
if (work) {
__set_current_state(TASK_RUNNING);
work->func(work);
+
+ spin_lock_irq(&worker->lock);
+ /* Allow to queue the work into another worker */
+ if (!kthread_work_pending(work))
+ work->worker = NULL;
+ spin_unlock_irq(&worker->lock);
} else if (!freezing(current))
schedule();
@@ -696,12 +717,22 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[])
}
EXPORT_SYMBOL(create_kthread_worker_on_cpu);
+static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
+ struct kthread_work *work)
+{
+ lockdep_assert_held(&worker->lock);
+ WARN_ON_ONCE(!irqs_disabled());
+ WARN_ON_ONCE(!list_empty(&work->node));
+ /* Do not use a work with more workers, see queue_kthread_work() */
+ WARN_ON_ONCE(work->worker && work->worker != worker);
+}
+
/* insert @work before @pos in @worker */
static void insert_kthread_work(struct kthread_worker *worker,
- struct kthread_work *work,
- struct list_head *pos)
+ struct kthread_work *work,
+ struct list_head *pos)
{
- lockdep_assert_held(&worker->lock);
+ insert_kthread_work_sanity_check(worker, work);
list_add_tail(&work->node, pos);
work->worker = worker;
@@ -717,6 +748,12 @@ static void insert_kthread_work(struct kthread_worker *worker,
* Queue @work to work processor @task for async execution. @task
* must have been created with kthread_worker_create(). Returns %true
* if @work was successfully queued, %false if it was already pending.
+ *
+ * Never queue a work into a worker when it is being processed by another
+ * one. Otherwise, some operations, e.g. cancel or flush, will not work
+ * correctly or the work might run in parallel. This is not enforced
+ * because it would make the code too complex. There are only warnings
+ * printed when such a situation is detected.
*/
bool queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work)
@@ -725,7 +762,7 @@ bool queue_kthread_work(struct kthread_worker *worker,
unsigned long flags;
spin_lock_irqsave(&worker->lock, flags);
- if (list_empty(&work->node)) {
+ if (!kthread_work_pending(work)) {
insert_kthread_work(worker, work, &worker->work_list);
ret = true;
}
--
1.8.5.6
We are going to use kthread_worker more widely and delayed works
will be pretty useful.
The implementation is inspired by workqueues. It uses a timer to
queue the work after the requested delay. If the delay is zero,
the work is queued immediately.
In compare with workqueues, each work is associated with a single
worker (kthread). Therefore the implementation could be much easier.
In particular, we use the worker->lock to synchronized all the
operations with the work. And we do not use any flags variable.
On the other hand, we add a pointer[*] to the timer into the struct
kthread_work. The kthread worker need to know if the work is used
and if it could clear work->worker. For this, it needs to know
whether the timer is active or not.
Finally, the timer callback knows only about the struct work.
It is better be paranoid and try to get the worker->lock carefully.
The try_lock_thread_work() function will be later useful also when
canceling the work.
[*] I considered also adding the entire struct timer_list into
struct kthread_work. But it would increase the size from
40 to 120 bytes on x86_64 with an often unused stuff.
Another alternative was to add a flags variable. But this
would add an extra code to synchronize it with the state
of the timer.
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/kthread.h | 34 +++++++++++++
kernel/kthread.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 162 insertions(+), 2 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c4a95a3ba500..1a5738dcdf8d 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -63,6 +63,7 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
*/
struct kthread_work;
typedef void (*kthread_work_func_t)(struct kthread_work *work);
+void delayed_kthread_work_timer_fn(unsigned long __data);
struct kthread_worker {
spinlock_t lock;
@@ -75,6 +76,12 @@ struct kthread_work {
struct list_head node;
kthread_work_func_t func;
struct kthread_worker *worker;
+ struct timer_list *timer;
+};
+
+struct delayed_kthread_work {
+ struct kthread_work work;
+ struct timer_list timer;
};
#define KTHREAD_WORKER_INIT(worker) { \
@@ -87,12 +94,24 @@ struct kthread_work {
.func = (fn), \
}
+#define DELAYED_KTHREAD_WORK_INIT(dwork, fn) { \
+ .work = KTHREAD_WORK_INIT((dwork).work, (fn)), \
+ .timer = __TIMER_INITIALIZER(delayed_kthread_work_timer_fn, \
+ 0, (unsigned long)&(dwork), \
+ TIMER_IRQSAFE), \
+ .work.timer = &(dwork).timer, \
+ }
+
#define DEFINE_KTHREAD_WORKER(worker) \
struct kthread_worker worker = KTHREAD_WORKER_INIT(worker)
#define DEFINE_KTHREAD_WORK(work, fn) \
struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
+#define DEFINE_DELAYED_KTHREAD_WORK(dwork, fn) \
+ struct delayed_kthread_work dwork = \
+ DELAYED_KTHREAD_WORK_INIT(dwork, fn)
+
/*
* kthread_worker.lock needs its own lockdep class key when defined on
* stack with lockdep enabled. Use the following macros in such cases.
@@ -122,6 +141,16 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
(work)->func = (fn); \
} while (0)
+#define init_delayed_kthread_work(dwork, fn) \
+ do { \
+ init_kthread_work(&(dwork)->work, (fn)); \
+ __setup_timer(&(dwork)->timer, \
+ delayed_kthread_work_timer_fn, \
+ (unsigned long)(dwork), \
+ TIMER_IRQSAFE); \
+ (dwork)->work.timer = &(dwork)->timer; \
+ } while (0)
+
int kthread_worker_fn(void *worker_ptr);
__printf(1, 2)
@@ -133,6 +162,11 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
bool queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work);
+
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay);
+
void flush_kthread_work(struct kthread_work *work);
void flush_kthread_worker(struct kthread_worker *worker);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 378d2203c8b0..0f4b348c2c7e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,12 +567,14 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
* Returns true when there is a pending operation for this work.
* In particular, it checks if the work is:
* - queued
+ * - a timer is running to queue this delayed work
*
* This function must be called with locked work.
*/
static inline bool kthread_work_pending(const struct kthread_work *work)
{
- return !list_empty(&work->node);
+ return !list_empty(&work->node) ||
+ (work->timer && timer_active(work->timer));
}
/**
@@ -740,6 +742,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
wake_up_process(worker->task);
}
+/*
+ * Queue @work right into the worker queue.
+ */
+static void __queue_kthread_work(struct kthread_worker *worker,
+ struct kthread_work *work)
+{
+ insert_kthread_work(worker, work, &worker->work_list);
+}
+
/**
* queue_kthread_work - queue a kthread_work
* @worker: target kthread_worker
@@ -763,7 +774,7 @@ bool queue_kthread_work(struct kthread_worker *worker,
spin_lock_irqsave(&worker->lock, flags);
if (!kthread_work_pending(work)) {
- insert_kthread_work(worker, work, &worker->work_list);
+ __queue_kthread_work(worker, work);
ret = true;
}
spin_unlock_irqrestore(&worker->lock, flags);
@@ -771,6 +782,121 @@ bool queue_kthread_work(struct kthread_worker *worker,
}
EXPORT_SYMBOL_GPL(queue_kthread_work);
+static bool try_lock_kthread_work(struct kthread_work *work)
+{
+ struct kthread_worker *worker;
+ int ret = false;
+
+try_again:
+ worker = work->worker;
+
+ if (!worker)
+ goto out;
+
+ spin_lock(&worker->lock);
+ if (worker != work->worker) {
+ spin_unlock(&worker->lock);
+ goto try_again;
+ }
+ ret = true;
+
+out:
+ return ret;
+}
+
+static inline void unlock_kthread_work(struct kthread_work *work)
+{
+ spin_unlock(&work->worker->lock);
+}
+
+/**
+ * delayed_kthread_work_timer_fn - callback that queues the associated delayed
+ * kthread work when the timer expires.
+ * @__data: pointer to the data associated with the timer
+ *
+ * The format of the function is defined by struct timer_list.
+ * It should have been called from irqsafe timer with irq already off.
+ */
+void delayed_kthread_work_timer_fn(unsigned long __data)
+{
+ struct delayed_kthread_work *dwork =
+ (struct delayed_kthread_work *)__data;
+ struct kthread_work *work = &dwork->work;
+
+ if (WARN_ON(!try_lock_kthread_work(work)))
+ return;
+
+ __queue_kthread_work(work->worker, work);
+ unlock_kthread_work(work);
+}
+EXPORT_SYMBOL(delayed_kthread_work_timer_fn);
+
+void __queue_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay)
+{
+ struct timer_list *timer = &dwork->timer;
+ struct kthread_work *work = &dwork->work;
+
+ WARN_ON_ONCE(timer->function != delayed_kthread_work_timer_fn ||
+ timer->data != (unsigned long)dwork);
+ WARN_ON_ONCE(timer_pending(timer));
+
+ /*
+ * If @delay is 0, queue @dwork->work immediately. This is for
+ * both optimization and correctness. The earliest @timer can
+ * expire is on the closest next tick and delayed_work users depend
+ * on that there's no such delay when @delay is 0.
+ */
+ if (!delay) {
+ __queue_kthread_work(worker, work);
+ return;
+ }
+
+ /* Be paranoid and try to detect possible races already now. */
+ insert_kthread_work_sanity_check(worker, work);
+
+ work->worker = worker;
+ timer_stats_timer_set_start_info(&dwork->timer);
+ timer->expires = jiffies + delay;
+ add_timer(timer);
+}
+
+/**
+ * queue_delayed_kthread_work - queue the associated kthread work
+ * after a delay.
+ * @worker: target kthread_worker
+ * @work: kthread_work to queue
+ * delay: number of jiffies to wait before queuing
+ *
+ * If the work has not been pending it starts a timer that will queue
+ * the work after the given @delay. If @delay is zero, it queues the
+ * work immediately.
+ *
+ * Return: %false if the @work has already been pending. It means that
+ * either the timer was running or the work was queued. It returns %true
+ * otherwise.
+ */
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay)
+{
+ struct kthread_work *work = &dwork->work;
+ unsigned long flags;
+ bool ret = false;
+
+ spin_lock_irqsave(&worker->lock, flags);
+
+ if (!kthread_work_pending(work)) {
+ __queue_delayed_kthread_work(worker, dwork, delay);
+ ret = true;
+ }
+
+ spin_unlock_irqrestore(&worker->lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(queue_delayed_kthread_work);
+
struct kthread_flush_work {
struct kthread_work work;
struct completion done;
--
1.8.5.6
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.
This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations
via the worker lock, we use del_timer_sync() and a counter to count
parallel cancel operations. Therefore the implementation might be easier.
First, we try to lock the work. If it does not work, it means that
no worker is assigned and that we are done.
Second, we try to cancel the timer when it exists. A problem is when
the timer callback is running at the same time. In this case, we need
to release the lock to avoid a deadlock and start from the beginning.
Third, we try to remove the work from the worker list.
Fourth, if the work is running, we call flush_kthread_work(). It might
take an arbitrary time. In the meantime, queuing of the work is blocked
by the new canceling counter.
As already mentioned, the check for a pending kthread work is done under
a lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_work_flush(). Any queuing is blocked until the counter is zero.
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/kthread.h | 4 ++
kernel/kthread.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 146 insertions(+)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 1a5738dcdf8d..dd2a587a2bd7 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -77,6 +77,7 @@ struct kthread_work {
kthread_work_func_t func;
struct kthread_worker *worker;
struct timer_list *timer;
+ int canceling;
};
struct delayed_kthread_work {
@@ -170,6 +171,9 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
void flush_kthread_work(struct kthread_work *work);
void flush_kthread_worker(struct kthread_worker *worker);
+bool cancel_kthread_work_sync(struct kthread_work *work);
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *work);
+
void destroy_kthread_worker(struct kthread_worker *worker);
#endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 0f4b348c2c7e..d12aa91cc44d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,6 +567,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
* Returns true when there is a pending operation for this work.
* In particular, it checks if the work is:
* - queued
+ * - being cancelled
* - a timer is running to queue this delayed work
*
* This function must be called with locked work.
@@ -574,6 +575,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
static inline bool kthread_work_pending(const struct kthread_work *work)
{
return !list_empty(&work->node) ||
+ work->canceling ||
(work->timer && timer_active(work->timer));
}
@@ -950,6 +952,146 @@ retry:
EXPORT_SYMBOL_GPL(flush_kthread_work);
/**
+ * try_to_cancel_kthread_work - Try to cancel kthread work.
+ * @work: work item to cancel
+ * @lock: lock used to protect the work
+ * @flags: flags stored when the lock was taken
+ *
+ * This function tries to cancel the given kthread work by deleting
+ * the timer and by removing the work from the queue.
+ *
+ * If the timer callback is in progress, it waits until it finishes
+ * but it has to drop the lock to avoid a deadlock.
+ *
+ * Return:
+ * 1 if @work was pending and successfully canceled
+ * 0 if @work was not pending
+ * -EAGAIN if the lock was dropped. The caller is supposed to
+ * take the lock again and repeat the operation.
+ */
+static int
+try_to_cancel_kthread_work(struct kthread_work *work,
+ spinlock_t *lock,
+ unsigned long *flags)
+{
+ int ret = 0;
+
+ if (work->timer) {
+ /* Try to cancel the timer if pending. */
+ if (del_timer(work->timer)) {
+ ret = 1;
+ goto out;
+ }
+
+ /* Are we racing with the timer callback? */
+ if (timer_active(work->timer)) {
+ /* Bad luck, need to avoid a deadlock. */
+ spin_unlock_irqrestore(lock, *flags);
+ del_timer_sync(work->timer);
+ ret = -EAGAIN;
+ goto out;
+ }
+ }
+
+ /* Try to remove queued work before it is being executed. */
+ if (!list_empty(&work->node)) {
+ list_del_init(&work->node);
+ ret = 1;
+ }
+
+out:
+ return ret;
+}
+
+static bool __cancel_kthread_work_sync(struct kthread_work *work)
+{
+ struct kthread_worker *worker;
+ unsigned long flags;
+ int ret;
+
+try_again:
+ local_irq_save(flags);
+ if (!try_lock_kthread_work(work)) {
+ local_irq_restore(flags);
+ ret = 0;
+ goto out;
+ }
+ worker = work->worker;
+
+ ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
+ if (ret == -EAGAIN)
+ goto try_again;
+
+ if (worker->current_work != work)
+ goto out_fast;
+
+ /*
+ * Need to wait until the work finished. Block queueing
+ * in the meantime.
+ */
+ work->canceling++;
+ spin_unlock_irqrestore(&worker->lock, flags);
+ flush_kthread_work(work);
+ /*
+ * Nobody is allowed to switch the worker or queue the work
+ * when .canceling is set
+ */
+ spin_lock_irqsave(&worker->lock, flags);
+ work->canceling--;
+
+out_fast:
+ /*
+ * Allow to queue the work into another worker if there is no other
+ * pending operation.
+ */
+ if (!work->canceling)
+ work->worker = NULL;
+ spin_unlock_irqrestore(&worker->lock, flags);
+
+out:
+ return ret;
+}
+
+/**
+ * cancel_kthread_work_sync - cancel a kthread work and wait for it to finish
+ * @work: the kthread work to cancel
+ *
+ * Cancel @work and wait for its execution to finish. This function
+ * can be used even if the work re-queues itself. On return from this
+ * function, @work is guaranteed to be not pending or executing on any CPU.
+ *
+ * The caller must ensure that the worker on which @work was last
+ * queued can't be destroyed before this function returns.
+ *
+ * Return:
+ * %true if @work was pending, %false otherwise.
+ */
+bool cancel_kthread_work_sync(struct kthread_work *work)
+{
+ /* Rather use cancel_delayed_kthread_work() for delayed works. */
+ WARN_ON_ONCE(work->timer);
+
+ return __cancel_kthread_work_sync(work);
+}
+EXPORT_SYMBOL_GPL(cancel_kthread_work_sync);
+
+/**
+ * cancel_delayed_kthread_work_sync - cancel a delayed kthread work and
+ * wait for it to finish.
+ * @dwork: the delayed kthread work to cancel
+ *
+ * This is cancel_kthread_work_sync() for delayed works.
+ *
+ * Return:
+ * %true if @dwork was pending, %false otherwise.
+ */
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *dwork)
+{
+ return __cancel_kthread_work_sync(&dwork->work);
+}
+EXPORT_SYMBOL_GPL(cancel_delayed_kthread_work_sync);
+
+/**
* flush_kthread_worker - flush all current works on a kthread_worker
* @worker: worker to flush
*
--
1.8.5.6
There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.
This patch implements mod_delayed_kthread_work() as inspired workqueues.
It tries to cancel the pending work and queue it again with the
given timeout.
A very special case is when the work is being canceled at the same time.
cancel_*kthread_work_sync() operation blocks queuing until the running
work finishes. Therefore we do nothing and let cancel() win. This should
not normally happen as the caller is supposed to synchronize these
operations a reasonable way.
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/kthread.h | 4 ++++
kernel/kthread.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index dd2a587a2bd7..f501dfeaa0e3 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -168,6 +168,10 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
struct delayed_kthread_work *dwork,
unsigned long delay);
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay);
+
void flush_kthread_work(struct kthread_work *work);
void flush_kthread_worker(struct kthread_worker *worker);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index d12aa91cc44d..4c3b845c719e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1003,6 +1003,56 @@ out:
return ret;
}
+/**
+ * mod_delayed_kthread_work - modify delay of or queue a delayed kthread work
+ * @worker: kthread worker to use
+ * @dwork: delayed kthread work to queue
+ * @delay: number of jiffies to wait before queuing
+ *
+ * If @dwork is idle, equivalent to queue_delayed_kthread work(). Otherwise,
+ * modify @dwork's timer so that it expires after @delay. If @delay is zero,
+ * @work is guaranteed to be queued immediately.
+ *
+ * Return: %false if @dwork was idle and queued. Return %true if @dwork was
+ * pending and its timer was modified.
+ *
+ * A special case is when cancel_work_sync() is running in parallel.
+ * It blocks further queuing. We let the cancel() win and return %false.
+ * The caller is supposed to synchronize these operations a reasonable way.
+ *
+ * This function is safe to call from any context including IRQ handler.
+ * See try_to_grab_pending_kthread_work() for details.
+ */
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+ struct delayed_kthread_work *dwork,
+ unsigned long delay)
+{
+ struct kthread_work *work = &dwork->work;
+ unsigned long flags;
+ int ret = 0;
+
+try_again:
+ spin_lock_irqsave(&worker->lock, flags);
+ WARN_ON_ONCE(work->worker && work->worker != worker);
+
+ if (work->canceling)
+ goto out;
+
+ ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
+ if (ret == -EAGAIN)
+ goto try_again;
+
+ if (work->canceling)
+ ret = 0;
+ else
+ __queue_delayed_kthread_work(worker, dwork, delay);
+
+out:
+ spin_unlock_irqrestore(&worker->lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(mod_delayed_kthread_work);
+
static bool __cancel_kthread_work_sync(struct kthread_work *work)
{
struct kthread_worker *worker;
--
1.8.5.6
This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.
It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.
I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.
Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/kthread.h | 11 ++++++++---
kernel/kthread.c | 17 ++++++++++++-----
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index f501dfeaa0e3..2dad7020047f 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -65,7 +65,12 @@ struct kthread_work;
typedef void (*kthread_work_func_t)(struct kthread_work *work);
void delayed_kthread_work_timer_fn(unsigned long __data);
+enum {
+ KTW_FREEZABLE = 1 << 2, /* freeze during suspend */
+};
+
struct kthread_worker {
+ unsigned int flags;
spinlock_t lock;
struct list_head work_list;
struct task_struct *task;
@@ -154,12 +159,12 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
int kthread_worker_fn(void *worker_ptr);
-__printf(1, 2)
+__printf(2, 3)
struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...);
+create_kthread_worker(unsigned int flags, const char namefmt[], ...);
struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
+create_kthread_worker_on_cpu(unsigned int flags, int cpu, const char namefmt[]);
bool queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4c3b845c719e..dbd090466e2a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -556,6 +556,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
const char *name,
struct lock_class_key *key)
{
+ worker->flags = 0;
spin_lock_init(&worker->lock);
lockdep_set_class_and_name(&worker->lock, key, name);
INIT_LIST_HEAD(&worker->work_list);
@@ -605,6 +606,10 @@ int kthread_worker_fn(void *worker_ptr)
*/
WARN_ON(worker->task && worker->task != current);
worker->task = current;
+
+ if (worker->flags & KTW_FREEZABLE)
+ set_freezable();
+
repeat:
set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
@@ -644,7 +649,8 @@ repeat:
EXPORT_SYMBOL_GPL(kthread_worker_fn);
static struct kthread_worker *
-__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+__create_kthread_worker(unsigned int flags, int cpu,
+ const char namefmt[], va_list args)
{
struct kthread_worker *worker;
struct task_struct *task;
@@ -664,6 +670,7 @@ __create_kthread_worker(int cpu, const char namefmt[], va_list args)
if (IS_ERR(task))
goto fail_task;
+ worker->flags = flags;
worker->task = task;
wake_up_process(task);
return worker;
@@ -682,13 +689,13 @@ fail_task:
* the worker was SIGKILLed.
*/
struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...)
+create_kthread_worker(unsigned int flags, const char namefmt[], ...)
{
struct kthread_worker *worker;
va_list args;
va_start(args, namefmt);
- worker = __create_kthread_worker(-1, namefmt, args);
+ worker = __create_kthread_worker(flags, -1, namefmt, args);
va_end(args);
return worker;
@@ -712,12 +719,12 @@ EXPORT_SYMBOL(create_kthread_worker);
* ERR_PTR(-EINVAL) on invalid @cpu.
*/
struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[])
+create_kthread_worker_on_cpu(unsigned int flags, int cpu, const char namefmt[])
{
if (cpu < 0 || cpu > num_possible_cpus())
return ERR_PTR(-EINVAL);
- return __create_kthread_worker(cpu, namefmt, NULL);
+ return __create_kthread_worker(flags, cpu, namefmt, NULL);
}
EXPORT_SYMBOL(create_kthread_worker_on_cpu);
--
1.8.5.6
Remove code duplication and use the new try_lock_kthread_work()
function in flush_kthread_work() as well.
Signed-off-by: Petr Mladek <[email protected]>
---
kernel/kthread.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index dbd090466e2a..f7caaaca5825 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -933,16 +933,12 @@ void flush_kthread_work(struct kthread_work *work)
struct kthread_worker *worker;
bool noop = false;
-retry:
- worker = work->worker;
- if (!worker)
+ local_irq_disable();
+ if (!try_lock_kthread_work(work)) {
+ local_irq_enable();
return;
-
- spin_lock_irq(&worker->lock);
- if (work->worker != worker) {
- spin_unlock_irq(&worker->lock);
- goto retry;
}
+ worker = work->worker;
if (!list_empty(&work->node))
insert_kthread_work(worker, &fwork.work, work->node.next);
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.
It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.
set_freezable() is not needed because the kthread worker is
created as freezable.
set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.
The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.
Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.
They delay between scans is done using a delayed work.
Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.
Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.
Signed-off-by: Petr Mladek <[email protected]>
---
mm/huge_memory.c | 134 ++++++++++++++++++++++++++++++-------------------------
1 file changed, 72 insertions(+), 62 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c29ddebc8705..be9153bfa506 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -57,10 +57,17 @@ static unsigned int khugepaged_full_scans;
static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
/* during fragmentation poll the hugepage allocator once every minute */
static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static DEFINE_DELAYED_KTHREAD_WORK(khugepaged_do_scan_work,
+ khugepaged_do_scan_func);
+static DEFINE_KTHREAD_WORK(khugepaged_cleanup_work, khugepaged_cleanup_func);
+
static DEFINE_MUTEX(khugepaged_mutex);
static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
/*
* default collapse hugepages if there is at least one pte mapped like
* it would have happened if the vma was large enough during page
@@ -68,7 +75,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
*/
static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
-static int khugepaged(void *none);
static int khugepaged_slab_init(void);
static void khugepaged_slab_exit(void);
@@ -144,29 +150,50 @@ static void set_recommended_min_free_kbytes(void)
setup_per_zone_wmarks();
}
+static int khugepaged_has_work(void)
+{
+ return !list_empty(&khugepaged_scan.mm_head);
+}
+
static int start_stop_khugepaged(void)
{
+ struct kthread_worker *worker;
int err = 0;
+
if (khugepaged_enabled()) {
- if (!khugepaged_thread)
- khugepaged_thread = kthread_run(khugepaged, NULL,
- "khugepaged");
- if (IS_ERR(khugepaged_thread)) {
- pr_err("khugepaged: kthread_run(khugepaged) failed\n");
- err = PTR_ERR(khugepaged_thread);
- khugepaged_thread = NULL;
- goto fail;
+ if (khugepaged_worker)
+ goto out;
+
+ worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+ if (IS_ERR(worker)) {
+ pr_err("khugepaged: failed to create kthread worker\n");
+ goto out;
}
- if (!list_empty(&khugepaged_scan.mm_head))
- wake_up_interruptible(&khugepaged_wait);
+ set_user_nice(worker->task, MAX_NICE);
+
+ /* Make the worker public and check for work synchronously. */
+ spin_lock(&khugepaged_mm_lock);
+ khugepaged_worker = worker;
+ if (khugepaged_has_work())
+ queue_delayed_kthread_work(worker,
+ &khugepaged_do_scan_work,
+ 0);
+ spin_unlock(&khugepaged_mm_lock);
set_recommended_min_free_kbytes();
- } else if (khugepaged_thread) {
- kthread_stop(khugepaged_thread);
- khugepaged_thread = NULL;
+ } else if (khugepaged_worker) {
+ /* First, stop others from using the worker. */
+ spin_lock(&khugepaged_mm_lock);
+ worker = khugepaged_worker;
+ khugepaged_worker = NULL;
+ spin_unlock(&khugepaged_mm_lock);
+
+ cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+ queue_kthread_work(worker, &khugepaged_cleanup_work);
+ destroy_kthread_worker(worker);
}
-fail:
+out:
return err;
}
@@ -425,7 +452,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
return -EINVAL;
khugepaged_scan_sleep_millisecs = msecs;
- wake_up_interruptible(&khugepaged_wait);
+
+ spin_lock(&khugepaged_mm_lock);
+ if (khugepaged_worker && khugepaged_has_work())
+ mod_delayed_kthread_work(khugepaged_worker,
+ &khugepaged_do_scan_work,
+ 0);
+ spin_unlock(&khugepaged_mm_lock);
return count;
}
@@ -452,7 +485,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
return -EINVAL;
khugepaged_alloc_sleep_millisecs = msecs;
- wake_up_interruptible(&khugepaged_wait);
+ wake_up_interruptible(&khugepaged_alloc_wait);
return count;
}
@@ -2094,7 +2127,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
int __khugepaged_enter(struct mm_struct *mm)
{
struct mm_slot *mm_slot;
- int wakeup;
+ int has_work;
mm_slot = alloc_mm_slot();
if (!mm_slot)
@@ -2113,13 +2146,15 @@ int __khugepaged_enter(struct mm_struct *mm)
* Insert just behind the scanning cursor, to let the area settle
* down a little.
*/
- wakeup = list_empty(&khugepaged_scan.mm_head);
+ has_work = khugepaged_has_work();
list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
- spin_unlock(&khugepaged_mm_lock);
atomic_inc(&mm->mm_count);
- if (wakeup)
- wake_up_interruptible(&khugepaged_wait);
+ if (khugepaged_worker && has_work)
+ mod_delayed_kthread_work(khugepaged_worker,
+ &khugepaged_do_scan_work,
+ 0);
+ spin_unlock(&khugepaged_mm_lock);
return 0;
}
@@ -2335,10 +2370,10 @@ static void khugepaged_alloc_sleep(void)
{
DEFINE_WAIT(wait);
- add_wait_queue(&khugepaged_wait, &wait);
+ add_wait_queue(&khugepaged_alloc_wait, &wait);
freezable_schedule_timeout_interruptible(
msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
- remove_wait_queue(&khugepaged_wait, &wait);
+ remove_wait_queue(&khugepaged_alloc_wait, &wait);
}
static int khugepaged_node_load[MAX_NUMNODES];
@@ -2847,19 +2882,7 @@ breakouterloop_mmap_sem:
return progress;
}
-static int khugepaged_has_work(void)
-{
- return !list_empty(&khugepaged_scan.mm_head) &&
- khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
- return !list_empty(&khugepaged_scan.mm_head) ||
- kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
{
struct page *hpage = NULL;
unsigned int progress = 0, pass_through_head = 0;
@@ -2874,7 +2897,7 @@ static void khugepaged_do_scan(void)
cond_resched();
- if (unlikely(kthread_should_stop() || try_to_freeze()))
+ if (unlikely(!khugepaged_enabled() || try_to_freeze()))
break;
spin_lock(&khugepaged_mm_lock);
@@ -2891,43 +2914,30 @@ static void khugepaged_do_scan(void)
if (!IS_ERR_OR_NULL(hpage))
put_page(hpage);
-}
-static void khugepaged_wait_work(void)
-{
if (khugepaged_has_work()) {
- if (!khugepaged_scan_sleep_millisecs)
- return;
- wait_event_freezable_timeout(khugepaged_wait,
- kthread_should_stop(),
- msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
- return;
- }
+ unsigned long delay = 0;
- if (khugepaged_enabled())
- wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+ if (khugepaged_scan_sleep_millisecs)
+ delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+ queue_delayed_kthread_work(khugepaged_worker,
+ &khugepaged_do_scan_work,
+ delay);
+ }
}
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
{
struct mm_slot *mm_slot;
- set_freezable();
- set_user_nice(current, MAX_NICE);
-
- while (!kthread_should_stop()) {
- khugepaged_do_scan();
- khugepaged_wait_work();
- }
-
spin_lock(&khugepaged_mm_lock);
mm_slot = khugepaged_scan.mm_slot;
khugepaged_scan.mm_slot = NULL;
if (mm_slot)
collect_mm_slot(mm_slot);
spin_unlock(&khugepaged_mm_lock);
- return 0;
}
static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the ring buffer benchmark producer into a kthread
worker because it modifies the scheduling priority and policy.
Also, it is a benchmark. It makes CPU very busy. It will most likely
run only limited time. IMHO, it does not make sense to mess the system
workqueues with it.
The thread is split into two independent works. It might look more
complicated but it helped me to find a race in the sleeping part
that was fixed separately.
kthread_should_stop() could not longer be used inside the works
because it defines the life of the worker and it needs to stay
usable until all works are done. Instead, we add @test_end
global variable. It is set during normal termination in compare
with @test_error.
Signed-off-by: Petr Mladek <[email protected]>
---
kernel/trace/ring_buffer_benchmark.c | 133 ++++++++++++++++-------------------
1 file changed, 59 insertions(+), 74 deletions(-)
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 6df9a83e20d7..7ff443f1e406 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -26,10 +26,17 @@ static int wakeup_interval = 100;
static int reader_finish;
static DECLARE_COMPLETION(read_start);
static DECLARE_COMPLETION(read_done);
-
static struct ring_buffer *buffer;
-static struct task_struct *producer;
-static struct task_struct *consumer;
+
+static void rb_producer_hammer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_producer_worker;
+static DEFINE_DELAYED_KTHREAD_WORK(rb_producer_hammer_work,
+ rb_producer_hammer_func);
+
+static void rb_consumer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_consumer_worker;
+static DEFINE_KTHREAD_WORK(rb_consumer_work, rb_consumer_func);
+
static unsigned long read;
static unsigned int disable_reader;
@@ -61,6 +68,7 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
static int read_events;
static int test_error;
+static int test_end;
#define TEST_ERROR() \
do { \
@@ -77,7 +85,7 @@ enum event_status {
static bool break_test(void)
{
- return test_error || kthread_should_stop();
+ return test_error || test_end;
}
static enum event_status read_event(int cpu)
@@ -262,8 +270,8 @@ static void ring_buffer_producer(void)
end_time = ktime_get();
cnt++;
- if (consumer && !(cnt % wakeup_interval))
- wake_up_process(consumer);
+ if (rb_consumer_worker && !(cnt % wakeup_interval))
+ wake_up_process(rb_consumer_worker->task);
#ifndef CONFIG_PREEMPT
/*
@@ -281,14 +289,14 @@ static void ring_buffer_producer(void)
} while (ktime_before(end_time, timeout) && !break_test());
trace_printk("End ring buffer hammer\n");
- if (consumer) {
+ if (rb_consumer_worker) {
/* Init both completions here to avoid races */
init_completion(&read_start);
init_completion(&read_done);
/* the completions must be visible before the finish var */
smp_wmb();
reader_finish = 1;
- wake_up_process(consumer);
+ wake_up_process(rb_consumer_worker->task);
wait_for_completion(&read_done);
}
@@ -366,68 +374,39 @@ static void ring_buffer_producer(void)
}
}
-static void wait_to_die(void)
-{
- set_current_state(TASK_INTERRUPTIBLE);
- while (!kthread_should_stop()) {
- schedule();
- set_current_state(TASK_INTERRUPTIBLE);
- }
- __set_current_state(TASK_RUNNING);
-}
-
-static int ring_buffer_consumer_thread(void *arg)
+static void rb_consumer_func(struct kthread_work *dummy)
{
- while (!break_test()) {
- complete(&read_start);
-
- ring_buffer_consumer();
+ complete(&read_start);
- set_current_state(TASK_INTERRUPTIBLE);
- if (break_test())
- break;
- schedule();
- }
- __set_current_state(TASK_RUNNING);
-
- if (!kthread_should_stop())
- wait_to_die();
-
- return 0;
+ ring_buffer_consumer();
}
-static int ring_buffer_producer_thread(void *arg)
+static void rb_producer_hammer_func(struct kthread_work *dummy)
{
- while (!break_test()) {
- ring_buffer_reset(buffer);
+ if (break_test())
+ return;
- if (consumer) {
- wake_up_process(consumer);
- wait_for_completion(&read_start);
- }
-
- ring_buffer_producer();
- if (break_test())
- goto out_kill;
+ ring_buffer_reset(buffer);
- trace_printk("Sleeping for 10 secs\n");
- set_current_state(TASK_INTERRUPTIBLE);
- if (break_test())
- goto out_kill;
- schedule_timeout(HZ * SLEEP_TIME);
+ if (rb_consumer_worker) {
+ queue_kthread_work(rb_consumer_worker, &rb_consumer_work);
+ wait_for_completion(&read_start);
}
-out_kill:
- __set_current_state(TASK_RUNNING);
- if (!kthread_should_stop())
- wait_to_die();
+ ring_buffer_producer();
- return 0;
+ if (break_test())
+ return;
+
+ trace_printk("Sleeping for 10 secs\n");
+ queue_delayed_kthread_work(rb_producer_worker,
+ &rb_producer_hammer_work,
+ HZ * SLEEP_TIME);
}
static int __init ring_buffer_benchmark_init(void)
{
- int ret;
+ int ret = 0;
/* make a one meg buffer in overwite mode */
buffer = ring_buffer_alloc(1000000, RB_FL_OVERWRITE);
@@ -435,19 +414,21 @@ static int __init ring_buffer_benchmark_init(void)
return -ENOMEM;
if (!disable_reader) {
- consumer = kthread_create(ring_buffer_consumer_thread,
- NULL, "rb_consumer");
- ret = PTR_ERR(consumer);
- if (IS_ERR(consumer))
+ rb_consumer_worker = create_kthread_worker(0, "rb_consumer");
+ if (IS_ERR(rb_consumer_worker)) {
+ ret = PTR_ERR(rb_consumer_worker);
goto out_fail;
+ }
}
- producer = kthread_run(ring_buffer_producer_thread,
- NULL, "rb_producer");
- ret = PTR_ERR(producer);
-
- if (IS_ERR(producer))
+ rb_producer_worker = create_kthread_worker(0, "rb_producer");
+ if (IS_ERR(rb_producer_worker)) {
+ ret = PTR_ERR(rb_producer_worker);
goto out_kill;
+ }
+
+ queue_delayed_kthread_work(rb_producer_worker,
+ &rb_producer_hammer_work, 0);
/*
* Run them as low-prio background tasks by default:
@@ -457,24 +438,26 @@ static int __init ring_buffer_benchmark_init(void)
struct sched_param param = {
.sched_priority = consumer_fifo
};
- sched_setscheduler(consumer, SCHED_FIFO, ¶m);
+ sched_setscheduler(rb_consumer_worker->task,
+ SCHED_FIFO, ¶m);
} else
- set_user_nice(consumer, consumer_nice);
+ set_user_nice(rb_consumer_worker->task, consumer_nice);
}
if (producer_fifo >= 0) {
struct sched_param param = {
.sched_priority = producer_fifo
};
- sched_setscheduler(producer, SCHED_FIFO, ¶m);
+ sched_setscheduler(rb_producer_worker->task,
+ SCHED_FIFO, ¶m);
} else
- set_user_nice(producer, producer_nice);
+ set_user_nice(rb_producer_worker->task, producer_nice);
return 0;
out_kill:
- if (consumer)
- kthread_stop(consumer);
+ if (rb_consumer_worker)
+ destroy_kthread_worker(rb_consumer_worker);
out_fail:
ring_buffer_free(buffer);
@@ -483,9 +466,11 @@ static int __init ring_buffer_benchmark_init(void)
static void __exit ring_buffer_benchmark_exit(void)
{
- kthread_stop(producer);
- if (consumer)
- kthread_stop(consumer);
+ test_end = 1;
+ cancel_delayed_kthread_work_sync(&rb_producer_hammer_work);
+ destroy_kthread_worker(rb_producer_worker);
+ if (rb_consumer_worker)
+ destroy_kthread_worker(rb_consumer_worker);
ring_buffer_free(buffer);
}
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts hungtaskd() in kthread worker API because
it modifies the priority.
The conversion is pretty straightforward. One iteration of the
main cycle is transferred into a self-queuing delayed kthread work.
We do not longer need to check if it was waken earlier. Instead,
the work timeout is modified when the timeout value is changed.
The user nice value is set from hung_task_init(). Otherwise, we
would need to add an extra init_work.
The patch also handles the error when the kthead worker could not
be crated from some reasons. It was broken before. For example,
wake_up_process would have failed if watchdog_task inclueded an error
code instead of a valid pointer.
Signed-off-by: Petr Mladek <[email protected]>
CC: [email protected]
---
kernel/hung_task.c | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index e0f90c2b57aa..65026f8b750e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -41,7 +41,9 @@ int __read_mostly sysctl_hung_task_warnings = 10;
static int __read_mostly did_panic;
-static struct task_struct *watchdog_task;
+static struct kthread_worker *watchdog_worker;
+static void watchdog_func(struct kthread_work *dummy);
+static DEFINE_DELAYED_KTHREAD_WORK(watchdog_work, watchdog_func);
/*
* Should we panic (and reboot, if panic_timeout= is set) when a
@@ -205,7 +207,9 @@ int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
if (ret || !write)
goto out;
- wake_up_process(watchdog_task);
+ if (watchdog_worker)
+ mod_delayed_kthread_work(watchdog_worker, &watchdog_work,
+ timeout_jiffies(sysctl_hung_task_timeout_secs));
out:
return ret;
@@ -222,30 +226,35 @@ EXPORT_SYMBOL_GPL(reset_hung_task_detector);
/*
* kthread which checks for tasks stuck in D state
*/
-static int watchdog(void *dummy)
+static void watchdog_func(struct kthread_work *dummy)
{
- set_user_nice(current, 0);
+ unsigned long timeout = sysctl_hung_task_timeout_secs;
- for ( ; ; ) {
- unsigned long timeout = sysctl_hung_task_timeout_secs;
+ if (atomic_xchg(&reset_hung_task, 0))
+ goto next;
- while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
- timeout = sysctl_hung_task_timeout_secs;
+ check_hung_uninterruptible_tasks(timeout);
- if (atomic_xchg(&reset_hung_task, 0))
- continue;
-
- check_hung_uninterruptible_tasks(timeout);
- }
-
- return 0;
+next:
+ queue_delayed_kthread_work(watchdog_worker, &watchdog_work,
+ timeout_jiffies(timeout));
}
static int __init hung_task_init(void)
{
+ struct kthread_worker *worker;
+
atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
- watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
+ worker = create_kthread_worker(0, "khungtaskd");
+ if (IS_ERR(worker)) {
+ pr_warn("Failed to create khungtaskd\n");
+ goto out;
+ }
+ watchdog_worker = worker;
+ set_user_nice(worker->task, 0);
+ queue_delayed_kthread_work(worker, &watchdog_work, 0);
+out:
return 0;
}
subsys_initcall(hung_task_init);
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the kmemleak kthread into the kthread worker
API because it modifies the scheduling priority.
The result is a simple self-queuing work that just calls kmemleak_scan().
The info messages and set_user_nice() are moved to the functions that
start and stop the worker. These are also renamed to mention worker
instead of thread.
We do not longer need to handle a spurious wakeup and count the remaining
timeout. It is handled by the worker. The delayed work is queued after
the full timeout passes.
Finally, the initial delay is done only when the kthread is started
during the boot. For this we added a parameter to the start function.
Signed-off-by: Petr Mladek <[email protected]>
CC: Catalin Marinas <[email protected]>
---
mm/kmemleak.c | 86 +++++++++++++++++++++++++++++------------------------------
1 file changed, 42 insertions(+), 44 deletions(-)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 19423a45d7d7..cb447d86a0e1 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -217,7 +217,9 @@ static int kmemleak_error;
static unsigned long min_addr = ULONG_MAX;
static unsigned long max_addr;
-static struct task_struct *scan_thread;
+static struct kthread_worker *kmemleak_scan_worker;
+static void kmemleak_scan_func(struct kthread_work *dummy);
+static DEFINE_DELAYED_KTHREAD_WORK(kmemleak_scan_work, kmemleak_scan_func);
/* used to avoid reporting of recently allocated objects */
static unsigned long jiffies_min_age;
static unsigned long jiffies_last_scan;
@@ -1471,54 +1473,46 @@ static void kmemleak_scan(void)
}
/*
- * Thread function performing automatic memory scanning. Unreferenced objects
- * at the end of a memory scan are reported but only the first time.
+ * Kthread worker function performing automatic memory scanning.
+ * Unreferenced objects at the end of a memory scan are reported
+ * but only the first time.
*/
-static int kmemleak_scan_thread(void *arg)
+static void kmemleak_scan_func(struct kthread_work *dummy)
{
- static int first_run = 1;
-
- pr_info("Automatic memory scanning thread started\n");
- set_user_nice(current, 10);
-
- /*
- * Wait before the first scan to allow the system to fully initialize.
- */
- if (first_run) {
- first_run = 0;
- ssleep(SECS_FIRST_SCAN);
- }
-
- while (!kthread_should_stop()) {
- signed long timeout = jiffies_scan_wait;
-
- mutex_lock(&scan_mutex);
- kmemleak_scan();
- mutex_unlock(&scan_mutex);
-
- /* wait before the next scan */
- while (timeout && !kthread_should_stop())
- timeout = schedule_timeout_interruptible(timeout);
- }
-
- pr_info("Automatic memory scanning thread ended\n");
+ mutex_lock(&scan_mutex);
+ kmemleak_scan();
+ mutex_unlock(&scan_mutex);
- return 0;
+ queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+ jiffies_scan_wait);
}
/*
* Start the automatic memory scanning thread. This function must be called
* with the scan_mutex held.
*/
-static void start_scan_thread(void)
+static void start_scan_thread(bool boot)
{
- if (scan_thread)
+ unsigned long timeout = 0;
+
+ if (kmemleak_scan_worker)
return;
- scan_thread = kthread_run(kmemleak_scan_thread, NULL, "kmemleak");
- if (IS_ERR(scan_thread)) {
- pr_warning("Failed to create the scan thread\n");
- scan_thread = NULL;
+ kmemleak_scan_worker = create_kthread_worker(0, "kmemleak");
+ if (IS_ERR(kmemleak_scan_worker)) {
+ pr_warn("Failed to create the memory scan worker\n");
+ kmemleak_scan_worker = NULL;
}
+ pr_info("Automatic memory scanning thread started\n");
+ set_user_nice(kmemleak_scan_worker->task, 10);
+
+ /*
+ * Wait before the first scan to allow the system to fully initialize.
+ */
+ if (boot)
+ timeout = msecs_to_jiffies(SECS_FIRST_SCAN * MSEC_PER_SEC);
+
+ queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+ timeout);
}
/*
@@ -1527,10 +1521,14 @@ static void start_scan_thread(void)
*/
static void stop_scan_thread(void)
{
- if (scan_thread) {
- kthread_stop(scan_thread);
- scan_thread = NULL;
- }
+ if (!kmemleak_scan_worker)
+ return;
+
+ cancel_delayed_kthread_work_sync(&kmemleak_scan_work);
+ destroy_kthread_worker(kmemleak_scan_worker);
+ kmemleak_scan_worker = NULL;
+
+ pr_info("Automatic memory scanning thread ended\n");
}
/*
@@ -1727,7 +1725,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
else if (strncmp(buf, "stack=off", 9) == 0)
kmemleak_stack_scan = 0;
else if (strncmp(buf, "scan=on", 7) == 0)
- start_scan_thread();
+ start_scan_thread(false);
else if (strncmp(buf, "scan=off", 8) == 0)
stop_scan_thread();
else if (strncmp(buf, "scan=", 5) == 0) {
@@ -1739,7 +1737,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
stop_scan_thread();
if (secs) {
jiffies_scan_wait = msecs_to_jiffies(secs * 1000);
- start_scan_thread();
+ start_scan_thread(false);
}
} else if (strncmp(buf, "scan", 4) == 0)
kmemleak_scan();
@@ -1963,7 +1961,7 @@ static int __init kmemleak_late_init(void)
if (!dentry)
pr_warning("Failed to create the debugfs kmemleak file\n");
mutex_lock(&scan_mutex);
- start_scan_thread();
+ start_scan_thread(true);
mutex_unlock(&scan_mutex);
pr_info("Kernel memory leak detector initialized\n");
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts kipmi kthread into the kthread worker API because
it modifies the scheduling priority. The change is quite straightforward.
First, we move the per-thread variable "busy_until" into the per-thread
structure struct smi_info. As a side effect, we could omit one parameter
in ipmi_thread_busy_wait(). On the other hand, the structure could not
longer be passed with the const qualifier.
The value of "busy_until" is initialized when the kthread is created.
Also the scheduling priority is set there. This helps to avoid an extra
init work.
One iteration of the kthread cycle is moved to a delayed work function.
The different delays between the cycles are solved the following way:
+ immediate cycle (nope) is converted into goto within the same work
+ immediate cycle with a possible reschedule is converted into
re-queuing with a zero delay
+ schedule_timeout() is converted into re-queuing with the given
delay
+ interruptible sleep is converted into nothing; The work
will get queued again from the check_start_timer_thread().
By other words the external wakeup_up_process() will get
replaced by queuing with a zero delay.
Probably the most tricky change is when the worker is being stopped.
We need to explicitly cancel the work to prevent it from re-queuing.
Signed-off-by: Petr Mladek <[email protected]>
CC: Corey Minyard <[email protected]>
CC: [email protected]
---
drivers/char/ipmi/ipmi_si_intf.c | 116 ++++++++++++++++++++++-----------------
1 file changed, 66 insertions(+), 50 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 654f6f36a071..fdb97eaded4b 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -302,7 +302,9 @@ struct smi_info {
/* Counters and things for the proc filesystem. */
atomic_t stats[SI_NUM_STATS];
- struct task_struct *thread;
+ struct kthread_worker *worker;
+ struct delayed_kthread_work work;
+ struct timespec64 busy_until;
struct list_head link;
union ipmi_smi_info_union addr_info;
@@ -929,8 +931,9 @@ static void check_start_timer_thread(struct smi_info *smi_info)
if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) {
smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
- if (smi_info->thread)
- wake_up_process(smi_info->thread);
+ if (smi_info->worker)
+ mod_delayed_kthread_work(smi_info->worker,
+ &smi_info->work, 0);
start_next_msg(smi_info);
smi_event_handler(smi_info, 0);
@@ -1008,10 +1011,10 @@ static inline int ipmi_si_is_busy(struct timespec64 *ts)
}
static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
- const struct smi_info *smi_info,
- struct timespec64 *busy_until)
+ struct smi_info *smi_info)
{
unsigned int max_busy_us = 0;
+ struct timespec64 *busy_until = &smi_info->busy_until;
if (smi_info->intf_num < num_max_busy_us)
max_busy_us = kipmid_max_busy_us[smi_info->intf_num];
@@ -1042,53 +1045,49 @@ static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
* (if that is enabled). See the paragraph on kimid_max_busy_us in
* Documentation/IPMI.txt for details.
*/
-static int ipmi_thread(void *data)
+static void ipmi_func(struct kthread_work *work)
{
- struct smi_info *smi_info = data;
+ struct smi_info *smi_info = container_of(work, struct smi_info,
+ work.work);
unsigned long flags;
enum si_sm_result smi_result;
- struct timespec64 busy_until;
+ int busy_wait;
- ipmi_si_set_not_busy(&busy_until);
- set_user_nice(current, MAX_NICE);
- while (!kthread_should_stop()) {
- int busy_wait;
+next:
+ spin_lock_irqsave(&(smi_info->si_lock), flags);
+ smi_result = smi_event_handler(smi_info, 0);
- spin_lock_irqsave(&(smi_info->si_lock), flags);
- smi_result = smi_event_handler(smi_info, 0);
+ /*
+ * If the driver is doing something, there is a possible
+ * race with the timer. If the timer handler see idle,
+ * and the thread here sees something else, the timer
+ * handler won't restart the timer even though it is
+ * required. So start it here if necessary.
+ */
+ if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
+ smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
- /*
- * If the driver is doing something, there is a possible
- * race with the timer. If the timer handler see idle,
- * and the thread here sees something else, the timer
- * handler won't restart the timer even though it is
- * required. So start it here if necessary.
- */
- if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
- smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
-
- spin_unlock_irqrestore(&(smi_info->si_lock), flags);
- busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
- &busy_until);
- if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
- ; /* do nothing */
- else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
- schedule();
- else if (smi_result == SI_SM_IDLE) {
- if (atomic_read(&smi_info->need_watch)) {
- schedule_timeout_interruptible(100);
- } else {
- /* Wait to be woken up when we are needed. */
- __set_current_state(TASK_INTERRUPTIBLE);
- schedule();
- }
- } else
- schedule_timeout_interruptible(1);
+ spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+ busy_wait = ipmi_thread_busy_wait(smi_result, smi_info);
+
+ if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
+ goto next;
+ if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait) {
+ queue_delayed_kthread_work(smi_info->worker,
+ &smi_info->work, 0);
+ } else if (smi_result == SI_SM_IDLE) {
+ if (atomic_read(&smi_info->need_watch)) {
+ queue_delayed_kthread_work(smi_info->worker,
+ &smi_info->work, 100);
+ } else {
+ /* Nope. Wait to be queued when we are needed. */
+ }
+ } else {
+ queue_delayed_kthread_work(smi_info->worker,
+ &smi_info->work, 1);
}
- return 0;
}
-
static void poll(void *send_info)
{
struct smi_info *smi_info = send_info;
@@ -1229,17 +1228,29 @@ static int smi_start_processing(void *send_info,
enable = 1;
if (enable) {
- new_smi->thread = kthread_run(ipmi_thread, new_smi,
- "kipmi%d", new_smi->intf_num);
- if (IS_ERR(new_smi->thread)) {
+ struct kthread_worker *worker;
+
+ worker = create_kthread_worker(0, "kipmi%d",
+ new_smi->intf_num);
+
+ if (IS_ERR(worker)) {
dev_notice(new_smi->dev, "Could not start"
" kernel thread due to error %ld, only using"
" timers to drive the interface\n",
- PTR_ERR(new_smi->thread));
- new_smi->thread = NULL;
+ PTR_ERR(worker));
+ goto out;
}
+
+ ipmi_si_set_not_busy(&new_smi->busy_until);
+ set_user_nice(worker->task, MAX_NICE);
+
+ init_delayed_kthread_work(&new_smi->work, ipmi_func);
+ queue_delayed_kthread_work(worker, &new_smi->work, 0);
+
+ new_smi->worker = worker;
}
+out:
return 0;
}
@@ -3414,8 +3425,13 @@ static void check_for_broken_irqs(struct smi_info *smi_info)
static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
{
- if (smi_info->thread != NULL)
- kthread_stop(smi_info->thread);
+ if (smi_info->worker != NULL) {
+ struct kthread_worker *worker = smi_info->worker;
+
+ smi_info->worker = NULL;
+ cancel_delayed_kthread_work_sync(&smi_info->work);
+ destroy_kthread_worker(worker);
+ }
if (smi_info->timer_running)
del_timer_sync(&smi_info->si_timer);
}
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.
The patch moves one iteration from the kthread into the work function.
It preserves the check for a spurious queuing (wake up). Then it
processes one request. Finally, it re-queues itself if more requests
are pending.
Otherwise, wake_up_process() is replaced by queuing the work.
Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.
Signed-off-by: Petr Mladek <[email protected]>
CC: Doug Ledford <[email protected]>
CC: Sean Hefty <[email protected]>
CC: Hal Rosenstock <[email protected]>
CC: [email protected]
---
drivers/infiniband/core/fmr_pool.c | 54 ++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 29 deletions(-)
diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
index 9f5ad7cc33c8..5f2b06bd14da 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -96,7 +96,8 @@ struct ib_fmr_pool {
void * arg);
void *flush_arg;
- struct task_struct *thread;
+ struct kthread_worker *worker;
+ struct kthread_work work;
atomic_t req_ser;
atomic_t flush_ser;
@@ -174,29 +175,26 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
spin_unlock_irq(&pool->pool_lock);
}
-static int ib_fmr_cleanup_thread(void *pool_ptr)
+static void ib_fmr_cleanup_func(struct kthread_work *work)
{
- struct ib_fmr_pool *pool = pool_ptr;
+ struct ib_fmr_pool *pool = container_of(work, struct ib_fmr_pool, work);
- do {
- if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0) {
- ib_fmr_batch_release(pool);
-
- atomic_inc(&pool->flush_ser);
- wake_up_interruptible(&pool->force_wait);
+ /*
+ * The same request might be queued twice when it appears and
+ * by re-queuing from this work.
+ */
+ if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0)
+ return;
- if (pool->flush_function)
- pool->flush_function(pool, pool->flush_arg);
- }
+ ib_fmr_batch_release(pool);
+ atomic_inc(&pool->flush_ser);
+ wake_up_interruptible(&pool->force_wait);
- set_current_state(TASK_INTERRUPTIBLE);
- if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0 &&
- !kthread_should_stop())
- schedule();
- __set_current_state(TASK_RUNNING);
- } while (!kthread_should_stop());
+ if (pool->flush_function)
+ pool->flush_function(pool, pool->flush_arg);
- return 0;
+ if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0)
+ queue_kthread_work(pool->worker, &pool->work);
}
/**
@@ -286,15 +284,13 @@ struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd *pd,
atomic_set(&pool->flush_ser, 0);
init_waitqueue_head(&pool->force_wait);
- pool->thread = kthread_run(ib_fmr_cleanup_thread,
- pool,
- "ib_fmr(%s)",
- device->name);
- if (IS_ERR(pool->thread)) {
- printk(KERN_WARNING PFX "couldn't start cleanup thread\n");
- ret = PTR_ERR(pool->thread);
+ pool->worker = create_kthread_worker(0, "ib_fmr(%s)", device->name);
+ if (IS_ERR(pool->worker)) {
+ pr_warn(PFX "couldn't start cleanup kthread worker\n");
+ ret = PTR_ERR(pool->worker);
goto out_free_pool;
}
+ init_kthread_work(&pool->work, ib_fmr_cleanup_func);
{
struct ib_pool_fmr *fmr;
@@ -362,7 +358,7 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
LIST_HEAD(fmr_list);
int i;
- kthread_stop(pool->thread);
+ destroy_kthread_worker(pool->worker);
ib_fmr_batch_release(pool);
i = 0;
@@ -412,7 +408,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
spin_unlock_irq(&pool->pool_lock);
serial = atomic_inc_return(&pool->req_ser);
- wake_up_process(pool->thread);
+ queue_kthread_work(pool->worker, &pool->work);
if (wait_event_interruptible(pool->force_wait,
atomic_read(&pool->flush_ser) - serial >= 0))
@@ -526,7 +522,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr)
list_add_tail(&fmr->list, &pool->dirty_list);
if (++pool->dirty_len >= pool->dirty_watermark) {
atomic_inc(&pool->req_ser);
- wake_up_process(pool->thread);
+ queue_kthread_work(pool->worker, &pool->work);
}
}
}
--
1.8.5.6
There is an attempt to print debug messages when the kthread is waken
and when it goes into sleep. It does not work well because the spin lock
does not guard all manipulations with the thread state.
I did not find a way how to print a message when the kthread really
goes into sleep. Instead, I added a state variable. It clearly marks
when a series of IO requests is started and finished. It makes sure
that we always have a pair of started/done messages.
The only problem is that it will print these messages also when
the kthread is created and there is no real work. We might want
to use create_kthread() instead of run_kthread(). Then the kthread
will stay stopped until the first request.
Important: This change is only compile tested. I did not find an easy
way how to test it. This is why I was conservative and did not modify
the kthread creation.
Signed-off-by: Petr Mladek <[email protected]>
CC: Maxim Levitsky <[email protected]>
---
drivers/memstick/host/r592.c | 19 +++++++++----------
drivers/memstick/host/r592.h | 2 +-
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index ef09ba0289d7..dd73c5506fdf 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -568,21 +568,24 @@ static int r592_process_thread(void *data)
{
int error;
struct r592_device *dev = (struct r592_device *)data;
- unsigned long flags;
while (!kthread_should_stop()) {
- spin_lock_irqsave(&dev->io_thread_lock, flags);
+ if (!dev->io_started) {
+ dbg_verbose("IO: started");
+ dev->io_started = true;
+ }
+
set_current_state(TASK_INTERRUPTIBLE);
error = memstick_next_req(dev->host, &dev->req);
- spin_unlock_irqrestore(&dev->io_thread_lock, flags);
if (error) {
if (error == -ENXIO || error == -EAGAIN) {
- dbg_verbose("IO: done IO, sleeping");
+ dbg_verbose("IO: done");
} else {
dbg("IO: unknown error from "
"memstick_next_req %d", error);
}
+ dev->io_started = false;
if (kthread_should_stop())
set_current_state(TASK_RUNNING);
@@ -714,15 +717,11 @@ static int r592_set_param(struct memstick_host *host,
static void r592_submit_req(struct memstick_host *host)
{
struct r592_device *dev = memstick_priv(host);
- unsigned long flags;
if (dev->req)
return;
- spin_lock_irqsave(&dev->io_thread_lock, flags);
- if (wake_up_process(dev->io_thread))
- dbg_verbose("IO thread woken to process requests");
- spin_unlock_irqrestore(&dev->io_thread_lock, flags);
+ wake_up_process(dev->io_thread);
}
static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -768,7 +767,6 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
dev->irq = pdev->irq;
spin_lock_init(&dev->irq_lock);
- spin_lock_init(&dev->io_thread_lock);
init_completion(&dev->dma_done);
INIT_KFIFO(dev->pio_fifo);
setup_timer(&dev->detect_timer,
@@ -780,6 +778,7 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
host->set_param = r592_set_param;
r592_check_dma(dev);
+ dev->io_started = false;
dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
if (IS_ERR(dev->io_thread)) {
error = PTR_ERR(dev->io_thread);
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index c5726c1e8832..aa8f0f22f4ce 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -137,10 +137,10 @@ struct r592_device {
void __iomem *mmio;
int irq;
spinlock_t irq_lock;
- spinlock_t io_thread_lock;
struct timer_list detect_timer;
struct task_struct *io_thread;
+ bool io_started;
bool parallel_mode;
DECLARE_KFIFO(pio_fifo, u8, sizeof(u32));
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the r592_io kthread into the kthread worker
API. I am not sure how busy the kthread is and if anyone would
like to control the resources. It is well possible that a workqueue
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.
The patch moves one iteration from the kthread into the kthread
worker function. It helps to remove all the hackery with process
state and kthread_should_stop().
The work is queued instead of waking the thread.
The work is explicitly canceled before the worker is destroyed.
It is self-queuing and it might take a long time until the queue
is drained, otherwise.
Important: The change is only compile tested. I did not find an easy
way how to check it in use.
Signed-off-by: Petr Mladek <[email protected]>
CC: Maxim Levitsky <[email protected]>
---
drivers/memstick/host/r592.c | 58 ++++++++++++++++++++------------------------
drivers/memstick/host/r592.h | 3 ++-
2 files changed, 28 insertions(+), 33 deletions(-)
diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index dd73c5506fdf..9d7cc27d8edc 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -563,40 +563,32 @@ out:
return;
}
-/* Main request processing thread */
-static int r592_process_thread(void *data)
+/* Main request processing work */
+static void r592_process_func(struct kthread_work *work)
{
int error;
- struct r592_device *dev = (struct r592_device *)data;
-
- while (!kthread_should_stop()) {
- if (!dev->io_started) {
- dbg_verbose("IO: started");
- dev->io_started = true;
- }
-
- set_current_state(TASK_INTERRUPTIBLE);
- error = memstick_next_req(dev->host, &dev->req);
+ struct r592_device *dev =
+ container_of(work, struct r592_device, io_work);
- if (error) {
- if (error == -ENXIO || error == -EAGAIN) {
- dbg_verbose("IO: done");
- } else {
- dbg("IO: unknown error from "
- "memstick_next_req %d", error);
- }
- dev->io_started = false;
+ if (!dev->io_started) {
+ dbg_verbose("IO: started");
+ dev->io_started = true;
+ }
- if (kthread_should_stop())
- set_current_state(TASK_RUNNING);
+ error = memstick_next_req(dev->host, &dev->req);
- schedule();
+ if (error) {
+ if (error == -ENXIO || error == -EAGAIN) {
+ dbg_verbose("IO: done");
} else {
- set_current_state(TASK_RUNNING);
- r592_execute_tpc(dev);
+ dbg("IO: unknown error from memstick_next_req %d",
+ error);
}
+ dev->io_started = false;
+ } else {
+ r592_execute_tpc(dev);
+ queue_kthread_work(dev->io_worker, &dev->io_work);
}
- return 0;
}
/* Reprogram chip to detect change in card state */
@@ -721,7 +713,7 @@ static void r592_submit_req(struct memstick_host *host)
if (dev->req)
return;
- wake_up_process(dev->io_thread);
+ queue_kthread_work(dev->io_worker, &dev->io_work);
}
static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -779,9 +771,10 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
r592_check_dma(dev);
dev->io_started = false;
- dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
- if (IS_ERR(dev->io_thread)) {
- error = PTR_ERR(dev->io_thread);
+ init_kthread_work(&dev->io_work, r592_process_func);
+ dev->io_worker = create_kthread_worker(0, "r592_io");
+ if (IS_ERR(dev->io_worker)) {
+ error = PTR_ERR(dev->io_worker);
goto error5;
}
@@ -807,7 +800,7 @@ error6:
dma_free_coherent(&pdev->dev, PAGE_SIZE, dev->dummy_dma_page,
dev->dummy_dma_page_physical_address);
- kthread_stop(dev->io_thread);
+ destroy_kthread_worker(dev->io_worker);
error5:
iounmap(dev->mmio);
error4:
@@ -827,7 +820,8 @@ static void r592_remove(struct pci_dev *pdev)
/* Stop the processing thread.
That ensures that we won't take any more requests */
- kthread_stop(dev->io_thread);
+ cancel_kthread_work_sync(&dev->io_work);
+ destroy_kthread_worker(dev->io_worker);
r592_enable_device(dev, false);
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index aa8f0f22f4ce..1ac71380ac04 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -139,7 +139,8 @@ struct r592_device {
spinlock_t irq_lock;
struct timer_list detect_timer;
- struct task_struct *io_thread;
+ struct kthread_worker *io_worker;
+ struct kthread_work io_work;
bool io_started;
bool parallel_mode;
--
1.8.5.6
This patch removes a code duplication. It does not modify
the functionality.
Signed-off-by: Petr Mladek <[email protected]>
CC: Zhang Rui <[email protected]>
CC: Eduardo Valentin <[email protected]>
CC: Jacob Pan <[email protected]>
CC: [email protected]
---
drivers/thermal/intel_powerclamp.c | 45 +++++++++++++++++---------------------
1 file changed, 20 insertions(+), 25 deletions(-)
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 6c79588251d5..cb32c38f9828 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -505,10 +505,27 @@ static void poll_pkg_cstate(struct work_struct *dummy)
schedule_delayed_work(&poll_pkg_cstate_work, HZ);
}
+static void start_power_clamp_thread(unsigned long cpu)
+{
+ struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
+ struct task_struct *thread;
+
+ thread = kthread_create_on_node(clamp_thread,
+ (void *) cpu,
+ cpu_to_node(cpu),
+ "kidle_inject/%ld", cpu);
+ if (IS_ERR(thread))
+ return;
+
+ /* bind to cpu here */
+ kthread_bind(thread, cpu);
+ wake_up_process(thread);
+ *p = thread;
+}
+
static int start_power_clamp(void)
{
unsigned long cpu;
- struct task_struct *thread;
/* check if pkg cstate counter is completely 0, abort in this case */
if (!has_pkg_state_counter()) {
@@ -530,20 +547,7 @@ static int start_power_clamp(void)
/* start one thread per online cpu */
for_each_online_cpu(cpu) {
- struct task_struct **p =
- per_cpu_ptr(powerclamp_thread, cpu);
-
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%ld", cpu);
- /* bind to cpu here */
- if (likely(!IS_ERR(thread))) {
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *p = thread;
- }
-
+ start_power_clamp_thread(cpu);
}
put_online_cpus();
@@ -575,7 +579,6 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
unsigned long cpu = (unsigned long)hcpu;
- struct task_struct *thread;
struct task_struct **percpu_thread =
per_cpu_ptr(powerclamp_thread, cpu);
@@ -584,15 +587,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
switch (action) {
case CPU_ONLINE:
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%lu", cpu);
- if (likely(!IS_ERR(thread))) {
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *percpu_thread = thread;
- }
+ start_power_clamp_thread(cpu);
/* prefer BSP as controlling CPU */
if (cpu == 0) {
control_cpu = 0;
--
1.8.5.6
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the intel powerclamp kthreads into the kthread
worker because they need to have a good control over the assigned
CPUs.
IMHO, the most natural way is to split one cycle into two works.
First one does some balancing and let the CPU work normal
way for some time. The second work checks what the CPU has done
in the meantime and put it into C-state to reach the required
idle time ratio. The delay between the two works is achieved
by the delayed kthread work.
The two works have to share some data that used to be local
variables of the single kthread function. This is achieved
by the new per-CPU struct kthread_worker_data. It might look
as a complication. On the other hand, the long original kthread
function was not nice either.
The patch tries to avoid extra init and cleanup works. All the
actions might be done outside the thread. They are moved
to the functions that create or destroy the worker. Especially,
I checked that the timers are assigned to the right CPU.
The two works are queuing each other. It makes it a bit tricky to
break it when we want to stop the worker. We use the global and
per-worker "clamping" variables to make sure that the re-queuing
eventually stops. We also cancel the works to make it faster.
Note that the canceling is not reliable because the handling
of the two variables and queuing is not synchronized via a lock.
But it is not a big deal because it is just an optimization.
The job is stopped faster than before in most cases.
Signed-off-by: Petr Mladek <[email protected]>
CC: Zhang Rui <[email protected]>
CC: Eduardo Valentin <[email protected]>
CC: Jacob Pan <[email protected]>
CC: [email protected]
---
drivers/thermal/intel_powerclamp.c | 287 ++++++++++++++++++++++---------------
1 file changed, 168 insertions(+), 119 deletions(-)
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index cb32c38f9828..58ea1862d412 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -86,11 +86,27 @@ static unsigned int control_cpu; /* The cpu assigned to collect stat and update
*/
static bool clamping;
+static const struct sched_param sparam = {
+ .sched_priority = MAX_USER_RT_PRIO / 2,
+};
+struct powerclamp_worker_data {
+ struct kthread_worker *worker;
+ struct kthread_work balancing_work;
+ struct delayed_kthread_work idle_injection_work;
+ struct timer_list wakeup_timer;
+ unsigned int cpu;
+ unsigned int count;
+ unsigned int guard;
+ unsigned int window_size_now;
+ unsigned int target_ratio;
+ unsigned int duration_jiffies;
+ bool clamping;
+};
-static struct task_struct * __percpu *powerclamp_thread;
+static struct powerclamp_worker_data * __percpu worker_data;
static struct thermal_cooling_device *cooling_dev;
static unsigned long *cpu_clamping_mask; /* bit map for tracking per cpu
- * clamping thread
+ * clamping kthread worker
*/
static unsigned int duration;
@@ -368,100 +384,102 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio,
return set_target_ratio + guard <= current_ratio;
}
-static int clamp_thread(void *arg)
+static void clamp_balancing_func(struct kthread_work *work)
{
- int cpunr = (unsigned long)arg;
- DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
- static const struct sched_param param = {
- .sched_priority = MAX_USER_RT_PRIO/2,
- };
- unsigned int count = 0;
- unsigned int target_ratio;
+ struct powerclamp_worker_data *w_data;
+ int sleeptime;
+ unsigned long target_jiffies;
+ unsigned int compensation;
+ int interval; /* jiffies to sleep for each attempt */
- set_bit(cpunr, cpu_clamping_mask);
- set_freezable();
- init_timer_on_stack(&wakeup_timer);
- sched_setscheduler(current, SCHED_FIFO, ¶m);
-
- while (true == clamping && !kthread_should_stop() &&
- cpu_online(cpunr)) {
- int sleeptime;
- unsigned long target_jiffies;
- unsigned int guard;
- unsigned int compensation = 0;
- int interval; /* jiffies to sleep for each attempt */
- unsigned int duration_jiffies = msecs_to_jiffies(duration);
- unsigned int window_size_now;
-
- try_to_freeze();
- /*
- * make sure user selected ratio does not take effect until
- * the next round. adjust target_ratio if user has changed
- * target such that we can converge quickly.
- */
- target_ratio = set_target_ratio;
- guard = 1 + target_ratio/20;
- window_size_now = window_size;
- count++;
+ w_data = container_of(work, struct powerclamp_worker_data,
+ balancing_work);
- /*
- * systems may have different ability to enter package level
- * c-states, thus we need to compensate the injected idle ratio
- * to achieve the actual target reported by the HW.
- */
- compensation = get_compensation(target_ratio);
- interval = duration_jiffies*100/(target_ratio+compensation);
-
- /* align idle time */
- target_jiffies = roundup(jiffies, interval);
- sleeptime = target_jiffies - jiffies;
- if (sleeptime <= 0)
- sleeptime = 1;
- schedule_timeout_interruptible(sleeptime);
- /*
- * only elected controlling cpu can collect stats and update
- * control parameters.
- */
- if (cpunr == control_cpu && !(count%window_size_now)) {
- should_skip =
- powerclamp_adjust_controls(target_ratio,
- guard, window_size_now);
- smp_mb();
- }
+ /*
+ * make sure user selected ratio does not take effect until
+ * the next round. adjust target_ratio if user has changed
+ * target such that we can converge quickly.
+ */
+ w_data->target_ratio = READ_ONCE(set_target_ratio);
+ w_data->guard = 1 + w_data->target_ratio / 20;
+ w_data->window_size_now = window_size;
+ w_data->duration_jiffies = msecs_to_jiffies(duration);
+ w_data->count++;
+
+ /*
+ * systems may have different ability to enter package level
+ * c-states, thus we need to compensate the injected idle ratio
+ * to achieve the actual target reported by the HW.
+ */
+ compensation = get_compensation(w_data->target_ratio);
+ interval = w_data->duration_jiffies * 100 /
+ (w_data->target_ratio + compensation);
+
+ /* align idle time */
+ target_jiffies = roundup(jiffies, interval);
+ sleeptime = target_jiffies - jiffies;
+ if (sleeptime <= 0)
+ sleeptime = 1;
+
+ if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+ queue_delayed_kthread_work(w_data->worker,
+ &w_data->idle_injection_work,
+ sleeptime);
+}
+
+static void clamp_idle_injection_func(struct kthread_work *work)
+{
+ struct powerclamp_worker_data *w_data;
+ unsigned long target_jiffies;
+
+ w_data = container_of(work, struct powerclamp_worker_data,
+ idle_injection_work.work);
+
+ /*
+ * only elected controlling cpu can collect stats and update
+ * control parameters.
+ */
+ if (w_data->cpu == control_cpu &&
+ !(w_data->count % w_data->window_size_now)) {
+ should_skip =
+ powerclamp_adjust_controls(w_data->target_ratio,
+ w_data->guard,
+ w_data->window_size_now);
+ smp_mb();
+ }
- if (should_skip)
- continue;
+ if (should_skip)
+ goto balance;
+
+ target_jiffies = jiffies + w_data->duration_jiffies;
+ mod_timer(&w_data->wakeup_timer, target_jiffies);
+ if (unlikely(local_softirq_pending()))
+ goto balance;
+ /*
+ * stop tick sched during idle time, interrupts are still
+ * allowed. thus jiffies are updated properly.
+ */
+ preempt_disable();
+ /* mwait until target jiffies is reached */
+ while (time_before(jiffies, target_jiffies)) {
+ unsigned long ecx = 1;
+ unsigned long eax = target_mwait;
- target_jiffies = jiffies + duration_jiffies;
- mod_timer(&wakeup_timer, target_jiffies);
- if (unlikely(local_softirq_pending()))
- continue;
/*
- * stop tick sched during idle time, interrupts are still
- * allowed. thus jiffies are updated properly.
+ * REVISIT: may call enter_idle() to notify drivers who
+ * can save power during cpu idle. same for exit_idle()
*/
- preempt_disable();
- /* mwait until target jiffies is reached */
- while (time_before(jiffies, target_jiffies)) {
- unsigned long ecx = 1;
- unsigned long eax = target_mwait;
-
- /*
- * REVISIT: may call enter_idle() to notify drivers who
- * can save power during cpu idle. same for exit_idle()
- */
- local_touch_nmi();
- stop_critical_timings();
- mwait_idle_with_hints(eax, ecx);
- start_critical_timings();
- atomic_inc(&idle_wakeup_counter);
- }
- preempt_enable();
+ local_touch_nmi();
+ stop_critical_timings();
+ mwait_idle_with_hints(eax, ecx);
+ start_critical_timings();
+ atomic_inc(&idle_wakeup_counter);
}
- del_timer_sync(&wakeup_timer);
- clear_bit(cpunr, cpu_clamping_mask);
+ preempt_enable();
- return 0;
+balance:
+ if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+ queue_kthread_work(w_data->worker, &w_data->balancing_work);
}
/*
@@ -505,22 +523,58 @@ static void poll_pkg_cstate(struct work_struct *dummy)
schedule_delayed_work(&poll_pkg_cstate_work, HZ);
}
-static void start_power_clamp_thread(unsigned long cpu)
+static void start_power_clamp_worker(unsigned long cpu)
{
- struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
- struct task_struct *thread;
-
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%ld", cpu);
- if (IS_ERR(thread))
+ struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+ struct kthread_worker *worker;
+
+ worker = create_kthread_worker_on_cpu(KTW_FREEZABLE, cpu,
+ "kidle_inject/%ld");
+ if (IS_ERR(worker))
return;
- /* bind to cpu here */
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *p = thread;
+ w_data->worker = worker;
+ w_data->count = 0;
+ w_data->cpu = cpu;
+ w_data->clamping = true;
+ set_bit(cpu, cpu_clamping_mask);
+ setup_timer(&w_data->wakeup_timer, noop_timer, 0);
+ sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
+ init_kthread_work(&w_data->balancing_work, clamp_balancing_func);
+ init_delayed_kthread_work(&w_data->idle_injection_work,
+ clamp_idle_injection_func);
+ queue_kthread_work(w_data->worker, &w_data->balancing_work);
+}
+
+static void stop_power_clamp_worker(unsigned long cpu)
+{
+ struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+
+ if (!w_data->worker)
+ return;
+
+ w_data->clamping = false;
+ /*
+ * Make sure that all works that get queued after this point see
+ * the clamping disabled. The counter part is not needed because
+ * there is an implicit memory barrier when the queued work
+ * is proceed.
+ */
+ smp_wmb();
+ cancel_kthread_work_sync(&w_data->balancing_work);
+ cancel_delayed_kthread_work_sync(&w_data->idle_injection_work);
+ /*
+ * The balancing work still might be queued here because
+ * the handling of the "clapming" variable, cancel, and queue
+ * operations are not synchronized via a lock. But it is not
+ * a big deal. The balancing work is fast and destroy kthread
+ * will wait for it.
+ */
+ del_timer_sync(&w_data->wakeup_timer);
+ clear_bit(w_data->cpu, cpu_clamping_mask);
+ destroy_kthread_worker(w_data->worker);
+
+ w_data->worker = NULL;
}
static int start_power_clamp(void)
@@ -545,9 +599,9 @@ static int start_power_clamp(void)
clamping = true;
schedule_delayed_work(&poll_pkg_cstate_work, 0);
- /* start one thread per online cpu */
+ /* start one kthread worker per online cpu */
for_each_online_cpu(cpu) {
- start_power_clamp_thread(cpu);
+ start_power_clamp_worker(cpu);
}
put_online_cpus();
@@ -557,20 +611,17 @@ static int start_power_clamp(void)
static void end_power_clamp(void)
{
int i;
- struct task_struct *thread;
- clamping = false;
/*
- * make clamping visible to other cpus and give per cpu clamping threads
- * sometime to exit, or gets killed later.
+ * Block requeuing in all the kthread workers. They will drain and
+ * stop faster.
*/
- smp_mb();
- msleep(20);
+ clamping = false;
if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
- pr_debug("clamping thread for cpu %d alive, kill\n", i);
- thread = *per_cpu_ptr(powerclamp_thread, i);
- kthread_stop(thread);
+ pr_debug("clamping worker for cpu %d alive, destroy\n",
+ i);
+ stop_power_clamp_worker(i);
}
}
}
@@ -579,15 +630,13 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
unsigned long cpu = (unsigned long)hcpu;
- struct task_struct **percpu_thread =
- per_cpu_ptr(powerclamp_thread, cpu);
if (false == clamping)
goto exit_ok;
switch (action) {
case CPU_ONLINE:
- start_power_clamp_thread(cpu);
+ start_power_clamp_worker(cpu);
/* prefer BSP as controlling CPU */
if (cpu == 0) {
control_cpu = 0;
@@ -598,7 +647,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
if (test_bit(cpu, cpu_clamping_mask)) {
pr_err("cpu %lu dead but powerclamping thread is not\n",
cpu);
- kthread_stop(*percpu_thread);
+ stop_power_clamp_worker(cpu);
}
if (cpu == control_cpu) {
control_cpu = smp_processor_id();
@@ -785,8 +834,8 @@ static int __init powerclamp_init(void)
window_size = 2;
register_hotcpu_notifier(&powerclamp_cpu_notifier);
- powerclamp_thread = alloc_percpu(struct task_struct *);
- if (!powerclamp_thread) {
+ worker_data = alloc_percpu(struct powerclamp_worker_data);
+ if (!worker_data) {
retval = -ENOMEM;
goto exit_unregister;
}
@@ -806,7 +855,7 @@ static int __init powerclamp_init(void)
return 0;
exit_free_thread:
- free_percpu(powerclamp_thread);
+ free_percpu(worker_data);
exit_unregister:
unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
exit_free:
@@ -819,7 +868,7 @@ static void __exit powerclamp_exit(void)
{
unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
end_power_clamp();
- free_percpu(powerclamp_thread);
+ free_percpu(worker_data);
thermal_cooling_device_unregister(cooling_dev);
kfree(cpu_clamping_mask);
--
1.8.5.6
On Wed, Nov 18, 2015 at 02:25:05PM +0100, Petr Mladek wrote:
> My intention is to make it easier to manipulate and maintain kthreads.
> Especially, I want to replace all the custom main cycles with a
> generic one. Also I want to make the kthreads sleep in a consistent
> state in a common place when there is no work.
>
> My first attempt was with a brand new API (iterant kthread), see
> http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
> directed to improve the existing kthread worker API. This is
> the 3rd iteration of the new direction.
>
>
> 1st patch: add support to check if a timer callback is being called
>
> 2nd..12th patches: improve the existing kthread worker API
>
> 13th..18th, 20th, 22nd patches: convert several kthreads into
> the kthread worker API, namely: khugepaged, ring buffer
> benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
> memstick/r592, intel_powerclamp
>
> 21st, 23rd patches: do some preparation steps; they usually do
> some clean up that makes sense even without the conversion.
>
>
> Changes against v2:
>
> + used worker->lock to synchronize the operations with the work
> instead of the PENDING bit as suggested by Tejun Heo; it simplified
> the implementation in several ways
>
> + added timer_active(); used it together with del_timer_sync()
> to cancel the work a less tricky way
>
> + removed the controversial conversion of the RCU kthreads
Thank you! ;-)
Thanx, Paul
On Wed, 18 Nov 2015, Petr Mladek wrote:
> timer_pending() checks whether the list of callbacks is empty.
> Each callback is removed from the list before it is called,
> see call_timer_fn() in __run_timers().
>
> Sometimes we need to make sure that the callback has finished.
> For example, if we want to free some resources that are accessed
> by the callback.
>
> For this purpose, this patch adds timer_active(). It checks both
> the list of callbacks and the running_timer. It takes the base_lock
> to see a consistent state.
>
> I plan to use it to implement delayed works in kthread worker.
> But I guess that it will have wider use. In fact, I wonder if
> timer_pending() is misused in some situations.
Well. That's nice and good. But how will that new function solve
anything? After you drop the lock the state is not longer valid.
Thanks,
tglx
On Wed 2015-11-18 23:32:28, Thomas Gleixner wrote:
> On Wed, 18 Nov 2015, Petr Mladek wrote:
> > timer_pending() checks whether the list of callbacks is empty.
> > Each callback is removed from the list before it is called,
> > see call_timer_fn() in __run_timers().
> >
> > Sometimes we need to make sure that the callback has finished.
> > For example, if we want to free some resources that are accessed
> > by the callback.
> >
> > For this purpose, this patch adds timer_active(). It checks both
> > the list of callbacks and the running_timer. It takes the base_lock
> > to see a consistent state.
> >
> > I plan to use it to implement delayed works in kthread worker.
> > But I guess that it will have wider use. In fact, I wonder if
> > timer_pending() is misused in some situations.
>
> Well. That's nice and good. But how will that new function solve
> anything? After you drop the lock the state is not longer valid.
If we prevent anyone from setting up the timer and timer_pending()
returns false, we are sure that the timer will stay as is.
For example, I use it in the function try_to_cancel_kthread_work().
Any manipulation with the timer is protected by worker->lock.
If the timer is not pending but still active, I have to drop
the lock and busy wait for the timer callback. See
http://thread.gmane.org/gmane.linux.kernel.mm/141493/focus=141501
Also I wonder if the following usage in
drivers/infiniband/hw/nes/nes_cm.c is safe:
static int mini_cm_dealloc_core(struct nes_cm_core *cm_core)
{
nes_debug(NES_DBG_CM, "De-Alloc CM Core (%p)\n", cm_core);
if (!cm_core)
return -EINVAL;
barrier();
if (timer_pending(&cm_core->tcp_timer))
del_timer(&cm_core->tcp_timer);
destroy_workqueue(cm_core->event_wq);
destroy_workqueue(cm_core->disconn_wq);
We destroy the workqueue but the timer callback might still
be in progress and queue new work.
There are many more locations where I see the pattern:
if (timer_pending())
del_timer();
clean_up_stuff();
IMHO, we should use:
if (timer_active())
del_timer_sync();
/* really safe to free stuff */
clean_up_stuff();
or just
del_timer_sync();
clean_up_stuff();
I wonder if timer_pending() is used in more racy scenarios. Or maybe,
I just miss something that makes it all safe.
Thanks,
Petr
On Wed, Nov 18, 2015 at 02:25:23PM +0100, Petr Mladek wrote:
> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
>
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
>
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
>
> This patch converts the frm_pool kthread into the kthread worker
s/frm/fmr
> API because I am not sure how busy the thread is. It is well
> possible that it does not need a dedicated kthread and workqueues
> would be perfectly fine. Well, the conversion between kthread
> worker API and workqueues is pretty trivial.
>
> The patch moves one iteration from the kthread into the work function.
> It preserves the check for a spurious queuing (wake up). Then it
> processes one request. Finally, it re-queues itself if more requests
> are pending.
>
> Otherwise, wake_up_process() is replaced by queuing the work.
>
> Important: The change is only compile tested. I did not find an easy
> way how to check it in a real life.
What are the expectations?
>
> Signed-off-by: Petr Mladek <[email protected]>
> CC: Doug Ledford <[email protected]>
> CC: Sean Hefty <[email protected]>
> CC: Hal Rosenstock <[email protected]>
> CC: [email protected]
> ---
> drivers/infiniband/core/fmr_pool.c | 54 ++++++++++++++++++--------------------
> 1 file changed, 25 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
> index 9f5ad7cc33c8..5f2b06bd14da 100644
> --- a/drivers/infiniband/core/fmr_pool.c
> +++ b/drivers/infiniband/core/fmr_pool.c
> @@ -96,7 +96,8 @@ struct ib_fmr_pool {
> void * arg);
> void *flush_arg;
>
> - struct task_struct *thread;
> + struct kthread_worker *worker;
> + struct kthread_work work;
>
> atomic_t req_ser;
> atomic_t flush_ser;
> @@ -174,29 +175,26 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
> spin_unlock_irq(&pool->pool_lock);
> }
>
> -static int ib_fmr_cleanup_thread(void *pool_ptr)
> +static void ib_fmr_cleanup_func(struct kthread_work *work)
> {
> - struct ib_fmr_pool *pool = pool_ptr;
> + struct ib_fmr_pool *pool = container_of(work, struct ib_fmr_pool, work);
>
> - do {
> - if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0) {
> - ib_fmr_batch_release(pool);
> -
> - atomic_inc(&pool->flush_ser);
> - wake_up_interruptible(&pool->force_wait);
> + /*
> + * The same request might be queued twice when it appears and
> + * by re-queuing from this work.
> + */
> + if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0)
> + return;
>
> - if (pool->flush_function)
> - pool->flush_function(pool, pool->flush_arg);
> - }
> + ib_fmr_batch_release(pool);
> + atomic_inc(&pool->flush_ser);
> + wake_up_interruptible(&pool->force_wait);
>
> - set_current_state(TASK_INTERRUPTIBLE);
> - if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0 &&
> - !kthread_should_stop())
> - schedule();
> - __set_current_state(TASK_RUNNING);
> - } while (!kthread_should_stop());
> + if (pool->flush_function)
> + pool->flush_function(pool, pool->flush_arg);
>
> - return 0;
> + if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0)
> + queue_kthread_work(pool->worker, &pool->work);
> }
>
> /**
> @@ -286,15 +284,13 @@ struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd *pd,
> atomic_set(&pool->flush_ser, 0);
> init_waitqueue_head(&pool->force_wait);
>
> - pool->thread = kthread_run(ib_fmr_cleanup_thread,
> - pool,
> - "ib_fmr(%s)",
> - device->name);
> - if (IS_ERR(pool->thread)) {
> - printk(KERN_WARNING PFX "couldn't start cleanup thread\n");
> - ret = PTR_ERR(pool->thread);
> + pool->worker = create_kthread_worker(0, "ib_fmr(%s)", device->name);
Is this patch depends on some other patch?
> + if (IS_ERR(pool->worker)) {
> + pr_warn(PFX "couldn't start cleanup kthread worker\n");
> + ret = PTR_ERR(pool->worker);
> goto out_free_pool;
> }
> + init_kthread_work(&pool->work, ib_fmr_cleanup_func);
>
> {
> struct ib_pool_fmr *fmr;
> @@ -362,7 +358,7 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
> LIST_HEAD(fmr_list);
> int i;
>
> - kthread_stop(pool->thread);
> + destroy_kthread_worker(pool->worker);
> ib_fmr_batch_release(pool);
>
> i = 0;
> @@ -412,7 +408,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
> spin_unlock_irq(&pool->pool_lock);
>
> serial = atomic_inc_return(&pool->req_ser);
> - wake_up_process(pool->thread);
> + queue_kthread_work(pool->worker, &pool->work);
>
> if (wait_event_interruptible(pool->force_wait,
> atomic_read(&pool->flush_ser) - serial >= 0))
> @@ -526,7 +522,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr)
> list_add_tail(&fmr->list, &pool->dirty_list);
> if (++pool->dirty_len >= pool->dirty_watermark) {
> atomic_inc(&pool->req_ser);
> - wake_up_process(pool->thread);
> + queue_kthread_work(pool->worker, &pool->work);
> }
> }
> }
> --
> 1.8.5.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/18/2015 07:25 AM, Petr Mladek wrote:
> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
>
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
>
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
>
> This patch converts kipmi kthread into the kthread worker API because
> it modifies the scheduling priority. The change is quite straightforward.
I think this is correct. That code was hard to get right, but I don't
see where any
logic is actually changed.
This also doesn't really look any simpler (you end up with more LOC than
you did before :) ),
though it will make things more consistent and reduce errors and that's
a good thing.
My only comment is I would like the worker function named ipmi_worker,
not ipmi_func.
Reviewed-by: Corey Minyard <[email protected]>
> First, we move the per-thread variable "busy_until" into the per-thread
> structure struct smi_info. As a side effect, we could omit one parameter
> in ipmi_thread_busy_wait(). On the other hand, the structure could not
> longer be passed with the const qualifier.
>
> The value of "busy_until" is initialized when the kthread is created.
> Also the scheduling priority is set there. This helps to avoid an extra
> init work.
>
> One iteration of the kthread cycle is moved to a delayed work function.
> The different delays between the cycles are solved the following way:
>
> + immediate cycle (nope) is converted into goto within the same work
>
> + immediate cycle with a possible reschedule is converted into
> re-queuing with a zero delay
>
> + schedule_timeout() is converted into re-queuing with the given
> delay
>
> + interruptible sleep is converted into nothing; The work
> will get queued again from the check_start_timer_thread().
> By other words the external wakeup_up_process() will get
> replaced by queuing with a zero delay.
>
> Probably the most tricky change is when the worker is being stopped.
> We need to explicitly cancel the work to prevent it from re-queuing.
>
> Signed-off-by: Petr Mladek <[email protected]>
> CC: Corey Minyard <[email protected]>
> CC: [email protected]
> ---
> drivers/char/ipmi/ipmi_si_intf.c | 116 ++++++++++++++++++++++-----------------
> 1 file changed, 66 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> index 654f6f36a071..fdb97eaded4b 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -302,7 +302,9 @@ struct smi_info {
> /* Counters and things for the proc filesystem. */
> atomic_t stats[SI_NUM_STATS];
>
> - struct task_struct *thread;
> + struct kthread_worker *worker;
> + struct delayed_kthread_work work;
> + struct timespec64 busy_until;
>
> struct list_head link;
> union ipmi_smi_info_union addr_info;
> @@ -929,8 +931,9 @@ static void check_start_timer_thread(struct smi_info *smi_info)
> if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) {
> smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
>
> - if (smi_info->thread)
> - wake_up_process(smi_info->thread);
> + if (smi_info->worker)
> + mod_delayed_kthread_work(smi_info->worker,
> + &smi_info->work, 0);
>
> start_next_msg(smi_info);
> smi_event_handler(smi_info, 0);
> @@ -1008,10 +1011,10 @@ static inline int ipmi_si_is_busy(struct timespec64 *ts)
> }
>
> static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
> - const struct smi_info *smi_info,
> - struct timespec64 *busy_until)
> + struct smi_info *smi_info)
> {
> unsigned int max_busy_us = 0;
> + struct timespec64 *busy_until = &smi_info->busy_until;
>
> if (smi_info->intf_num < num_max_busy_us)
> max_busy_us = kipmid_max_busy_us[smi_info->intf_num];
> @@ -1042,53 +1045,49 @@ static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
> * (if that is enabled). See the paragraph on kimid_max_busy_us in
> * Documentation/IPMI.txt for details.
> */
> -static int ipmi_thread(void *data)
> +static void ipmi_func(struct kthread_work *work)
> {
> - struct smi_info *smi_info = data;
> + struct smi_info *smi_info = container_of(work, struct smi_info,
> + work.work);
> unsigned long flags;
> enum si_sm_result smi_result;
> - struct timespec64 busy_until;
> + int busy_wait;
>
> - ipmi_si_set_not_busy(&busy_until);
> - set_user_nice(current, MAX_NICE);
> - while (!kthread_should_stop()) {
> - int busy_wait;
> +next:
> + spin_lock_irqsave(&(smi_info->si_lock), flags);
> + smi_result = smi_event_handler(smi_info, 0);
>
> - spin_lock_irqsave(&(smi_info->si_lock), flags);
> - smi_result = smi_event_handler(smi_info, 0);
> + /*
> + * If the driver is doing something, there is a possible
> + * race with the timer. If the timer handler see idle,
> + * and the thread here sees something else, the timer
> + * handler won't restart the timer even though it is
> + * required. So start it here if necessary.
> + */
> + if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
> + smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
>
> - /*
> - * If the driver is doing something, there is a possible
> - * race with the timer. If the timer handler see idle,
> - * and the thread here sees something else, the timer
> - * handler won't restart the timer even though it is
> - * required. So start it here if necessary.
> - */
> - if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
> - smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
> -
> - spin_unlock_irqrestore(&(smi_info->si_lock), flags);
> - busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
> - &busy_until);
> - if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
> - ; /* do nothing */
> - else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
> - schedule();
> - else if (smi_result == SI_SM_IDLE) {
> - if (atomic_read(&smi_info->need_watch)) {
> - schedule_timeout_interruptible(100);
> - } else {
> - /* Wait to be woken up when we are needed. */
> - __set_current_state(TASK_INTERRUPTIBLE);
> - schedule();
> - }
> - } else
> - schedule_timeout_interruptible(1);
> + spin_unlock_irqrestore(&(smi_info->si_lock), flags);
> + busy_wait = ipmi_thread_busy_wait(smi_result, smi_info);
> +
> + if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
> + goto next;
> + if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait) {
> + queue_delayed_kthread_work(smi_info->worker,
> + &smi_info->work, 0);
> + } else if (smi_result == SI_SM_IDLE) {
> + if (atomic_read(&smi_info->need_watch)) {
> + queue_delayed_kthread_work(smi_info->worker,
> + &smi_info->work, 100);
> + } else {
> + /* Nope. Wait to be queued when we are needed. */
> + }
> + } else {
> + queue_delayed_kthread_work(smi_info->worker,
> + &smi_info->work, 1);
> }
> - return 0;
> }
>
> -
> static void poll(void *send_info)
> {
> struct smi_info *smi_info = send_info;
> @@ -1229,17 +1228,29 @@ static int smi_start_processing(void *send_info,
> enable = 1;
>
> if (enable) {
> - new_smi->thread = kthread_run(ipmi_thread, new_smi,
> - "kipmi%d", new_smi->intf_num);
> - if (IS_ERR(new_smi->thread)) {
> + struct kthread_worker *worker;
> +
> + worker = create_kthread_worker(0, "kipmi%d",
> + new_smi->intf_num);
> +
> + if (IS_ERR(worker)) {
> dev_notice(new_smi->dev, "Could not start"
> " kernel thread due to error %ld, only using"
> " timers to drive the interface\n",
> - PTR_ERR(new_smi->thread));
> - new_smi->thread = NULL;
> + PTR_ERR(worker));
> + goto out;
> }
> +
> + ipmi_si_set_not_busy(&new_smi->busy_until);
> + set_user_nice(worker->task, MAX_NICE);
> +
> + init_delayed_kthread_work(&new_smi->work, ipmi_func);
> + queue_delayed_kthread_work(worker, &new_smi->work, 0);
> +
> + new_smi->worker = worker;
> }
>
> +out:
> return 0;
> }
>
> @@ -3414,8 +3425,13 @@ static void check_for_broken_irqs(struct smi_info *smi_info)
>
> static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
> {
> - if (smi_info->thread != NULL)
> - kthread_stop(smi_info->thread);
> + if (smi_info->worker != NULL) {
> + struct kthread_worker *worker = smi_info->worker;
> +
> + smi_info->worker = NULL;
> + cancel_delayed_kthread_work_sync(&smi_info->work);
> + destroy_kthread_worker(worker);
> + }
> if (smi_info->timer_running)
> del_timer_sync(&smi_info->si_timer);
> }
Hello,
On Wed, Nov 18, 2015 at 02:25:12PM +0100, Petr Mladek wrote:
> @@ -610,6 +625,12 @@ repeat:
> if (work) {
> __set_current_state(TASK_RUNNING);
> work->func(work);
> +
> + spin_lock_irq(&worker->lock);
> + /* Allow to queue the work into another worker */
> + if (!kthread_work_pending(work))
> + work->worker = NULL;
> + spin_unlock_irq(&worker->lock);
Doesn't this mean that the work item can't be freed from its callback?
That pattern tends to happen regularly.
Thanks.
--
tejun
Hello,
On Wed, Nov 18, 2015 at 02:25:14PM +0100, Petr Mladek wrote:
> +static int
> +try_to_cancel_kthread_work(struct kthread_work *work,
> + spinlock_t *lock,
> + unsigned long *flags)
> +{
> + int ret = 0;
> +
> + if (work->timer) {
> + /* Try to cancel the timer if pending. */
> + if (del_timer(work->timer)) {
> + ret = 1;
> + goto out;
> + }
> +
> + /* Are we racing with the timer callback? */
> + if (timer_active(work->timer)) {
> + /* Bad luck, need to avoid a deadlock. */
> + spin_unlock_irqrestore(lock, *flags);
> + del_timer_sync(work->timer);
> + ret = -EAGAIN;
> + goto out;
> + }
As the timer side is already kinda trylocking anyway, can't the cancel
path be made simpler? Sth like
lock(worker);
work->canceling = true;
del_timer_sync(work->timer);
unlock(worker);
And the timer can do (ignoring the multiple worker support, do we even
need that?)
while (!trylock(worker)) {
if (work->canceling)
return;
cpu_relax();
}
queue;
unlock(worker);
Thanks.
--
tejun
On Mon 2015-11-23 17:27:03, Tejun Heo wrote:
> Hello,
>
> On Wed, Nov 18, 2015 at 02:25:12PM +0100, Petr Mladek wrote:
> > @@ -610,6 +625,12 @@ repeat:
> > if (work) {
> > __set_current_state(TASK_RUNNING);
> > work->func(work);
> > +
> > + spin_lock_irq(&worker->lock);
> > + /* Allow to queue the work into another worker */
> > + if (!kthread_work_pending(work))
> > + work->worker = NULL;
> > + spin_unlock_irq(&worker->lock);
>
> Doesn't this mean that the work item can't be freed from its callback?
> That pattern tends to happen regularly.
I am not sure if I understand your question. Do you mean switching
work->func during the life time of the struct kthread_work? This
should not be affected by the above code.
The above code allows to queue an _unused_ kthread_work into any
kthread_worker. For example, it is needed for khugepaged,
see http://marc.info/?l=linux-kernel&m=144785344924871&w=2
The work is static but the worker can be started/stopped
(allocated/freed) repeatedly. It means that the work need
to be usable with many workers. But it is associated only
with one worker when being used.
If the work is in use (pending or being proceed), we must not
touch work->worker. Otherwise there might be a race. Because
all the operations with the work are synchronized using
work->worker->lock.
I hope that it makes sense.
Thanks a lot for feedback,
Petr
On Mon 2015-11-23 17:58:23, Tejun Heo wrote:
> Hello,
>
> On Wed, Nov 18, 2015 at 02:25:14PM +0100, Petr Mladek wrote:
> > +static int
> > +try_to_cancel_kthread_work(struct kthread_work *work,
> > + spinlock_t *lock,
> > + unsigned long *flags)
> > +{
> > + int ret = 0;
> > +
> > + if (work->timer) {
> > + /* Try to cancel the timer if pending. */
> > + if (del_timer(work->timer)) {
> > + ret = 1;
> > + goto out;
> > + }
> > +
> > + /* Are we racing with the timer callback? */
> > + if (timer_active(work->timer)) {
> > + /* Bad luck, need to avoid a deadlock. */
> > + spin_unlock_irqrestore(lock, *flags);
> > + del_timer_sync(work->timer);
> > + ret = -EAGAIN;
> > + goto out;
> > + }
>
> As the timer side is already kinda trylocking anyway, can't the cancel
> path be made simpler? Sth like
>
> lock(worker);
> work->canceling = true;
> del_timer_sync(work->timer);
> unlock(worker);
>
> And the timer can do (ignoring the multiple worker support, do we even
> need that?)
>
> while (!trylock(worker)) {
> if (work->canceling)
> return;
> cpu_relax();
> }
> queue;
> unlock(worker);
Why did I not find out this myself ?:-)
Thanks for hint,
Petr
On Mon 2015-11-23 13:36:06, Corey Minyard wrote:
>
>
> On 11/18/2015 07:25 AM, Petr Mladek wrote:
> > Kthreads are currently implemented as an infinite loop. Each
> > has its own variant of checks for terminating, freezing,
> > awakening. In many cases it is unclear to say in which state
> > it is and sometimes it is done a wrong way.
> >
> > The plan is to convert kthreads into kthread_worker or workqueues
> > API. It allows to split the functionality into separate operations.
> > It helps to make a better structure. Also it defines a clean state
> > where no locks are taken, IRQs blocked, the kthread might sleep
> > or even be safely migrated.
> >
> > The kthread worker API is useful when we want to have a dedicated
> > single thread for the work. It helps to make sure that it is
> > available when needed. Also it allows a better control, e.g.
> > define a scheduling priority.
> >
> > This patch converts kipmi kthread into the kthread worker API because
> > it modifies the scheduling priority. The change is quite straightforward.
>
> I think this is correct. That code was hard to get right, but I don't
> see where any
> logic is actually changed.
I believe that it was hard to make it working.
> This also doesn't really look any simpler (you end up with more LOC than
> you did before :) ),
> though it will make things more consistent and reduce errors and that's
> a good thing.
I have just realized that the original code actually looks racy. For
example, it does:
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
without rechecking the state in between. There might already be a new
message and it might miss the wake_up_process(). Similar problem is
with the schedule_timeout_interruptible(100); I mean:
CPU 0 CPU 1
ipmi_thread()
spin_lock_irqsave();
smi_result = smi_event_handler();
spin_unlock_irqrestore();
[...]
else if (smi_result == SI_SM_IDLE)
/* true */
if (atomic_read(need_watch)) {
/* true */
sender()
spin_lock_irqsave()
check_start_timer_thread()
wake_up_process()
/*
* NOPE because kthread
* is not sleeping
*/
schedule_timeout_interruptible(100);
/*
* We sleep 100 jiffies but
* there is a pending message.
*/
This is not a problem with the kthread worker API because
mod_delayed_kthread_work(smi_info->worker,
&smi_info->work, 0);
would queue the work to be done immediately and
queue_delayed_kthread_work(smi_info->worker,
&smi_info->work, 100);
would do nothing in this case.
> My only comment is I would like the worker function named ipmi_worker,
> not ipmi_func.
You probably want it because the original name was ipmi_thread. But
it might cause confusion with new_smi->worker. The function gets
assigned to work->func, see struct kthread_work. Therefore I think that
_func suffix makes more sense.
> Reviewed-by: Corey Minyard <[email protected]>
Thanks a lot for review,
Petr
On 11/24/2015 06:12 AM, Petr Mladek wrote:
> On Mon 2015-11-23 13:36:06, Corey Minyard wrote:
>>
>> On 11/18/2015 07:25 AM, Petr Mladek wrote:
>>> Kthreads are currently implemented as an infinite loop. Each
>>> has its own variant of checks for terminating, freezing,
>>> awakening. In many cases it is unclear to say in which state
>>> it is and sometimes it is done a wrong way.
>>>
>>> The plan is to convert kthreads into kthread_worker or workqueues
>>> API. It allows to split the functionality into separate operations.
>>> It helps to make a better structure. Also it defines a clean state
>>> where no locks are taken, IRQs blocked, the kthread might sleep
>>> or even be safely migrated.
>>>
>>> The kthread worker API is useful when we want to have a dedicated
>>> single thread for the work. It helps to make sure that it is
>>> available when needed. Also it allows a better control, e.g.
>>> define a scheduling priority.
>>>
>>> This patch converts kipmi kthread into the kthread worker API because
>>> it modifies the scheduling priority. The change is quite straightforward.
>> I think this is correct. That code was hard to get right, but I don't
>> see where any
>> logic is actually changed.
> I believe that it was hard to make it working.
>
>
>> This also doesn't really look any simpler (you end up with more LOC than
>> you did before :) ),
>> though it will make things more consistent and reduce errors and that's
>> a good thing.
> I have just realized that the original code actually looks racy. For
> example, it does:
>
> __set_current_state(TASK_INTERRUPTIBLE);
> schedule();
>
> without rechecking the state in between. There might already be a new
> message and it might miss the wake_up_process(). Similar problem is
> with the schedule_timeout_interruptible(100); I mean:
>
>
> CPU 0 CPU 1
>
>
> ipmi_thread()
> spin_lock_irqsave();
> smi_result = smi_event_handler();
> spin_unlock_irqrestore();
>
> [...]
> else if (smi_result == SI_SM_IDLE)
> /* true */
> if (atomic_read(need_watch)) {
> /* true */
>
> sender()
> spin_lock_irqsave()
> check_start_timer_thread()
> wake_up_process()
>
> /*
> * NOPE because kthread
> * is not sleeping
> */
>
> schedule_timeout_interruptible(100);
>
> /*
> * We sleep 100 jiffies but
> * there is a pending message.
> */
Yes, I knew the code was racy, but this is a performance optimization and
it wasn't that important to get it perfect. The thread wouldn't actually
wait 100 jiffies, it would just be run by timer interrupts for that time.
>
> This is not a problem with the kthread worker API because
>
> mod_delayed_kthread_work(smi_info->worker,
> &smi_info->work, 0);
>
> would queue the work to be done immediately and
>
> queue_delayed_kthread_work(smi_info->worker,
> &smi_info->work, 100);
>
> would do nothing in this case.
And indeed this is a lot better.
>
>> My only comment is I would like the worker function named ipmi_worker,
>> not ipmi_func.
> You probably want it because the original name was ipmi_thread. But
> it might cause confusion with new_smi->worker. The function gets
> assigned to work->func, see struct kthread_work. Therefore I think that
> _func suffix makes more sense.
My problem with _func is that it's way too generic. Is this a function
that handled IPMI messages? Message done handling? I'm not enamored
with my name, but I want something that gives a better indication of
what the function does. ipmi_kthread_worker_func() would be fine with me.
Thanks,
-corey
>> Reviewed-by: Corey Minyard <[email protected]>
>
> Thanks a lot for review,
> Petr
Hello, Petr.
On Tue, Nov 24, 2015 at 11:06:50AM +0100, Petr Mladek wrote:
> > > @@ -610,6 +625,12 @@ repeat:
> > > if (work) {
> > > __set_current_state(TASK_RUNNING);
> > > work->func(work);
> > > +
> > > + spin_lock_irq(&worker->lock);
> > > + /* Allow to queue the work into another worker */
> > > + if (!kthread_work_pending(work))
> > > + work->worker = NULL;
> > > + spin_unlock_irq(&worker->lock);
> >
> > Doesn't this mean that the work item can't be freed from its callback?
> > That pattern tends to happen regularly.
>
> I am not sure if I understand your question. Do you mean switching
> work->func during the life time of the struct kthread_work? This
> should not be affected by the above code.
So, something like the following.
void my_work_fn(work)
{
struct my_struct *s = container_of(work, ...);
do something with s;
kfree(s);
}
and the queuer does
struct my_struct *s = kmalloc(sizeof(*s));
init s and s->work;
queue(&s->work);
expecting s to be freed on completion. IOW, you can't expect the work
item to remain accessible once the work function starts executing.
> The above code allows to queue an _unused_ kthread_work into any
> kthread_worker. For example, it is needed for khugepaged,
> see http://marc.info/?l=linux-kernel&m=144785344924871&w=2
> The work is static but the worker can be started/stopped
> (allocated/freed) repeatedly. It means that the work need
> to be usable with many workers. But it is associated only
> with one worker when being used.
It can just re-init work items when it restarts workers, right?
Thanks.
--
tejun
On Tue, Nov 24, 2015 at 11:06:50AM +0100, Petr Mladek wrote:
> On Mon 2015-11-23 17:27:03, Tejun Heo wrote:
> > Hello,
> >
> > On Wed, Nov 18, 2015 at 02:25:12PM +0100, Petr Mladek wrote:
> > > @@ -610,6 +625,12 @@ repeat:
> > > if (work) {
> > > __set_current_state(TASK_RUNNING);
> > > work->func(work);
> > > +
> > > + spin_lock_irq(&worker->lock);
> > > + /* Allow to queue the work into another worker */
> > > + if (!kthread_work_pending(work))
> > > + work->worker = NULL;
> > > + spin_unlock_irq(&worker->lock);
> >
> > Doesn't this mean that the work item can't be freed from its callback?
> > That pattern tends to happen regularly.
>
> I am not sure if I understand your question. Do you mean switching
> work->func during the life time of the struct kthread_work? This
> should not be affected by the above code.
No, work->func(work) doing: kfree(work).
That is indeed something quite frequently done, and since you now have
references to work after calling func, things would go *boom* rather
quickly.
On Tue 2015-11-24 09:49:42, Tejun Heo wrote:
> Hello, Petr.
>
> On Tue, Nov 24, 2015 at 11:06:50AM +0100, Petr Mladek wrote:
> > > > @@ -610,6 +625,12 @@ repeat:
> > > > if (work) {
> > > > __set_current_state(TASK_RUNNING);
> > > > work->func(work);
> > > > +
> > > > + spin_lock_irq(&worker->lock);
> > > > + /* Allow to queue the work into another worker */
> > > > + if (!kthread_work_pending(work))
> > > > + work->worker = NULL;
> > > > + spin_unlock_irq(&worker->lock);
> > >
> > > Doesn't this mean that the work item can't be freed from its callback?
> > > That pattern tends to happen regularly.
> >
> > I am not sure if I understand your question. Do you mean switching
> > work->func during the life time of the struct kthread_work? This
> > should not be affected by the above code.
>
>IOW, you can't expect the work
> item to remain accessible once the work function starts executing.
I see, I was not aware of this pattern.
> > The above code allows to queue an _unused_ kthread_work into any
> > kthread_worker. For example, it is needed for khugepaged,
> > see http://marc.info/?l=linux-kernel&m=144785344924871&w=2
> > The work is static but the worker can be started/stopped
> > (allocated/freed) repeatedly. It means that the work need
> > to be usable with many workers. But it is associated only
> > with one worker when being used.
>
> It can just re-init work items when it restarts workers, right?
Yes, this would work. It might be slightly inconvenient but
it looks like a good compromise. It helps to keep the API
implementation rather simple and rather secure.
Alternatively, we could allow to queue the work on another worker
if it is not pending. But then we would need to check the pending
status without the worker->lock because work->worker might point
to an already freed worker. We need to check the pending
status in many situations. It might open a can of worms that
I probably do not want to catch.
Thank you and PeterZ for explanation,
Petr
On Mon, Nov 23, 2015 at 2:58 PM, Tejun Heo <[email protected]> wrote:
>
> And the timer can do (ignoring the multiple worker support, do we even
> need that?)
>
> while (!trylock(worker)) {
> if (work->canceling)
> return;
> cpu_relax();
> }
No no no!
People, you need to learn that code like the above is *not*
acceptable. It's busy-looping on a spinlock, and constantly trying to
*write* to the spinlock.
It will literally crater performance on a multi-socket SMP system if
it ever triggers. We're talking 10x slowdowns, and absolutely
unacceptable cache coherency traffic.
These kinds of loops absolutely *have* to have the read-only part. The
"cpu_relax()" above needs to be a loop that just tests the lock state
by *reading* it, so the cpu_relax() needs to be replaced with
something like
while (spin_is_locked(lock)) cpu_relax();
instead (possibly just "spin_unlock_wait()" - but the explicit loop
might be worth it if you then want to check the "canceling" flag
independently of the lock state too).
In general, it's very dangerous to try to cook up your own locking
rules. People *always* get it wrong.
Linus
Hello,
On Tue, Nov 24, 2015 at 12:23:53PM -0800, Linus Torvalds wrote:
> instead (possibly just "spin_unlock_wait()" - but the explicit loop
I see. Wasn't thinking about cache traffic. Yeah, spin_unlock_wait()
seems a lot better.
> might be worth it if you then want to check the "canceling" flag
> independently of the lock state too).
>
> In general, it's very dangerous to try to cook up your own locking
> rules. People *always* get it wrong.
It's either trylock on timer side or timer active spinning trick on
canceling side, so this seems the lesser of the two evils.
Thanks.
--
tejun
On Tue, Nov 24, 2015 at 12:28 PM, Tejun Heo <[email protected]> wrote:
>>
>> In general, it's very dangerous to try to cook up your own locking
>> rules. People *always* get it wrong.
>
> It's either trylock on timer side or timer active spinning trick on
> canceling side, so this seems the lesser of the two evils.
I'm not saying the approach is wrong.
I'm saying that people need to realize that locking is harder than
they think, and not cook up their own lock primitives using things
like trylock without really thinking about it a *lot*.
Basically, "trylock()" on its own should never be used in a loop. The
main use for trylock should be one of:
- thing that you can just not do at all if you can't get the lock
- avoiding ABBA deadlocks: if you have a A->B locking order, but you
already hold B, instead of "drop B, then take A and B in the right
order", you may decide to first "trylock(A)" - and if that fails you
then fall back on the "drop and relock in the right order".
but if what you want to create is a "get lock using trylock", you need
to be very aware of the cache coherency traffic issue at least.
It is possible that we should think about trying to introduce a new
primitive for that "loop_try_lock()" thing. But it's probably not
common enough to be worth it - we've had this issue before, but I
think it's a "once every couple of years" kind of thing rather than
anything that we need to worry about.
The "locking is hard" issue is very real, though. We've traditionally
had a *lot* of code that tried to do its own locking, and not getting
the memory ordering right etc. Things that happen to work on x86 but
don't on other architectures etc.
Linus
On Wed, 18 Nov 2015, Petr Mladek wrote:
> kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
> ("kthread: Implement park/unpark facility"). It is currently used
> only when enabling new CPU. For this purpose, the newly created
> kthread has to be parked.
>
> The CPU binding is a bit tricky. The kthread is parked when the CPU
> has not been allowed yet. And the CPU is bound when the kthread
> is unparked.
>
> The function would be useful for more per-CPU kthreads, e.g.
> bnx2fc_thread, fcoethread. For this purpose, the newly created
> kthread should stay in the uninterruptible state.
>
> This patch moves the parking into smpboot. It binds the thread
> already when created. Then the function might be used universally.
> Also the behavior is consistent with kthread_create() and
> kthread_create_on_node().
>
> Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>