"[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
while flush_cpu_workqueue() plays with it. But there is another problem, CPU
can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
In that case pending work_structs can migrate to CPU which was already checked,
so we should redo the "for_each_online_cpu(cpu)" loop.
Signed-off-by: Oleg Nesterov <[email protected]>
--- mm-6.20-rc2/kernel/workqueue.c~3_race 2006-12-29 18:37:31.000000000 +0300
+++ mm-6.20-rc2/kernel/workqueue.c 2006-12-30 18:09:07.000000000 +0300
@@ -65,6 +65,7 @@ struct workqueue_struct {
/* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
threads to each one as cpus come/go. */
+static long hotplug_sequence __read_mostly;
static DEFINE_MUTEX(workqueue_mutex);
static LIST_HEAD(workqueues);
@@ -454,10 +455,16 @@ void fastcall flush_workqueue(struct wor
/* Always use first cpu's area. */
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, singlethread_cpu));
} else {
+ long sequence;
int cpu;
+again:
+ sequence = hotplug_sequence;
for_each_online_cpu(cpu)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+
+ if (unlikely(sequence != hotplug_sequence))
+ goto again;
}
mutex_unlock(&workqueue_mutex);
}
@@ -874,6 +881,7 @@ static int __devinit workqueue_cpu_callb
cleanup_workqueue_thread(wq, hotcpu);
list_for_each_entry(wq, &workqueues, list)
take_over_work(wq, hotcpu);
+ hotplug_sequence++;
break;
case CPU_LOCK_RELEASE:
On Sat, 30 Dec 2006 19:10:31 +0300
Oleg Nesterov <[email protected]> wrote:
> "[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
> while flush_cpu_workqueue() plays with it. But there is another problem, CPU
> can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
> In that case pending work_structs can migrate to CPU which was already checked,
> so we should redo the "for_each_online_cpu(cpu)" loop.
>
I have a mental note that these:
extend-notifier_call_chain-to-count-nr_calls-made.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch
handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
should be scrapped. But really I forget what their status is. Gautham,
can you please remind us where we're at?
Hi Andrew,
Sorry, I am yet to check out Venki's and Oleg's patches as I
just returned from Vacation.
On Tue, Jan 02, 2007 at 04:27:27PM -0800, Andrew Morton wrote:
>
> I have a mental note that these:
>
> extend-notifier_call_chain-to-count-nr_calls-made.patch
> extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
> extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch
These patches are needed because they allow us to send out the "failed"
notifications to only those subsystems that received the "prepare"
notifications earlier.
> define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
> define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch
These were posted inorder to have a common place where the subsystems
could lock their per-subsystem hotplug mutexes/semaphore from within the
cpu-hotplug-callback function. Hence they are needed IMO.
> eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
> eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch
These patches define and use a mutex to handle cpu-hotplug and eliminate
the use of lock_cpu_hotplug in sched.c. Hence they are still needed.
> handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
Again, this one ensures that workqueue_mutex is taken/released on
CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
function. So this one is required, unless it conflicts with what Oleg
has posted. Will check that out tonite.
>
> should be scrapped. But really I forget what their status is. Gautham,
> can you please remind us where we're at?
>
If all goes fine (w.r.t cpufreq and workqueue), eliminating
lock_cpu_hotplug from kernel/*.c should be relatively easy.<fingers crossed>
Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
On Wed, Jan 03, 2007 at 07:34:59PM +0530, Gautham R Shenoy wrote:
>
> > handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
>
> Again, this one ensures that workqueue_mutex is taken/released on
> CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
> function. So this one is required, unless it conflicts with what Oleg
> has posted. Will check that out tonite.
We would still be needing this patch as it's complementing what Oleg has
posted.
Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
On 01/03, Gautham R Shenoy wrote:
>
> On Wed, Jan 03, 2007 at 07:34:59PM +0530, Gautham R Shenoy wrote:
> >
> > > handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
> >
> > Again, this one ensures that workqueue_mutex is taken/released on
> > CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
> > function. So this one is required, unless it conflicts with what Oleg
> > has posted. Will check that out tonite.
>
> We would still be needing this patch as it's complementing what Oleg has
> posted.
I thought that these patches don't depend on each other, flush_work/workueue
don't care where cpu-hotplug takes workqueue_mutex, in CPU_LOCK_ACQUIRE or in
CPU_UP_PREPARE case (or CPU_DEAD/CPU_LOCK_RELEASE for unlock).
Could you clarify? Just curious.
Oleg.
On Wed, Jan 03, 2007 at 08:26:57PM +0300, Oleg Nesterov wrote:
>
> I thought that these patches don't depend on each other, flush_work/workueue
> don't care where cpu-hotplug takes workqueue_mutex, in CPU_LOCK_ACQUIRE or in
> CPU_UP_PREPARE case (or CPU_DEAD/CPU_LOCK_RELEASE for unlock).
>
> Could you clarify? Just curious.
You are right. They don't depend on each other.
The intention behind introducing CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE
was to have a standard place where the subsystems could acquire/release
the "cpu hotplug protection" mutex in the cpu_hotplug callback function.
The same can be acheived by acquiring these mutexes in
CPU_UP_PREPARE/CPU_DOWN_PREPARE etc.
This is true for every subsystem that is cpu-hotplug aware.
> Oleg.
>
Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"