2006-12-30 16:09:47

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

"[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
while flush_cpu_workqueue() plays with it. But there is another problem, CPU
can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
In that case pending work_structs can migrate to CPU which was already checked,
so we should redo the "for_each_online_cpu(cpu)" loop.

Signed-off-by: Oleg Nesterov <[email protected]>

--- mm-6.20-rc2/kernel/workqueue.c~3_race 2006-12-29 18:37:31.000000000 +0300
+++ mm-6.20-rc2/kernel/workqueue.c 2006-12-30 18:09:07.000000000 +0300
@@ -65,6 +65,7 @@ struct workqueue_struct {

/* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
threads to each one as cpus come/go. */
+static long hotplug_sequence __read_mostly;
static DEFINE_MUTEX(workqueue_mutex);
static LIST_HEAD(workqueues);

@@ -454,10 +455,16 @@ void fastcall flush_workqueue(struct wor
/* Always use first cpu's area. */
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, singlethread_cpu));
} else {
+ long sequence;
int cpu;
+again:
+ sequence = hotplug_sequence;

for_each_online_cpu(cpu)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+
+ if (unlikely(sequence != hotplug_sequence))
+ goto again;
}
mutex_unlock(&workqueue_mutex);
}
@@ -874,6 +881,7 @@ static int __devinit workqueue_cpu_callb
cleanup_workqueue_thread(wq, hotcpu);
list_for_each_entry(wq, &workqueues, list)
take_over_work(wq, hotcpu);
+ hotplug_sequence++;
break;

case CPU_LOCK_RELEASE:


2007-01-03 00:28:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

On Sat, 30 Dec 2006 19:10:31 +0300
Oleg Nesterov <[email protected]> wrote:

> "[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
> while flush_cpu_workqueue() plays with it. But there is another problem, CPU
> can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
> In that case pending work_structs can migrate to CPU which was already checked,
> so we should redo the "for_each_online_cpu(cpu)" loop.
>

I have a mental note that these:

extend-notifier_call_chain-to-count-nr_calls-made.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch
handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch

should be scrapped. But really I forget what their status is. Gautham,
can you please remind us where we're at?

Subject: Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

Hi Andrew,

Sorry, I am yet to check out Venki's and Oleg's patches as I
just returned from Vacation.

On Tue, Jan 02, 2007 at 04:27:27PM -0800, Andrew Morton wrote:
>
> I have a mental note that these:
>
> extend-notifier_call_chain-to-count-nr_calls-made.patch
> extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
> extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch

These patches are needed because they allow us to send out the "failed"
notifications to only those subsystems that received the "prepare"
notifications earlier.

> define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
> define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch

These were posted inorder to have a common place where the subsystems
could lock their per-subsystem hotplug mutexes/semaphore from within the
cpu-hotplug-callback function. Hence they are needed IMO.

> eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
> eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch

These patches define and use a mutex to handle cpu-hotplug and eliminate
the use of lock_cpu_hotplug in sched.c. Hence they are still needed.

> handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch

Again, this one ensures that workqueue_mutex is taken/released on
CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
function. So this one is required, unless it conflicts with what Oleg
has posted. Will check that out tonite.

>
> should be scrapped. But really I forget what their status is. Gautham,
> can you please remind us where we're at?
>

If all goes fine (w.r.t cpufreq and workqueue), eliminating
lock_cpu_hotplug from kernel/*.c should be relatively easy.<fingers crossed>

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

Subject: Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

On Wed, Jan 03, 2007 at 07:34:59PM +0530, Gautham R Shenoy wrote:
>
> > handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
>
> Again, this one ensures that workqueue_mutex is taken/released on
> CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
> function. So this one is required, unless it conflicts with what Oleg
> has posted. Will check that out tonite.

We would still be needing this patch as it's complementing what Oleg has
posted.

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-01-03 17:26:45

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

On 01/03, Gautham R Shenoy wrote:
>
> On Wed, Jan 03, 2007 at 07:34:59PM +0530, Gautham R Shenoy wrote:
> >
> > > handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch
> >
> > Again, this one ensures that workqueue_mutex is taken/released on
> > CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in the cpuhotplug callback
> > function. So this one is required, unless it conflicts with what Oleg
> > has posted. Will check that out tonite.
>
> We would still be needing this patch as it's complementing what Oleg has
> posted.

I thought that these patches don't depend on each other, flush_work/workueue
don't care where cpu-hotplug takes workqueue_mutex, in CPU_LOCK_ACQUIRE or in
CPU_UP_PREPARE case (or CPU_DEAD/CPU_LOCK_RELEASE for unlock).

Could you clarify? Just curious.

Oleg.

Subject: Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

On Wed, Jan 03, 2007 at 08:26:57PM +0300, Oleg Nesterov wrote:
>
> I thought that these patches don't depend on each other, flush_work/workueue
> don't care where cpu-hotplug takes workqueue_mutex, in CPU_LOCK_ACQUIRE or in
> CPU_UP_PREPARE case (or CPU_DEAD/CPU_LOCK_RELEASE for unlock).
>
> Could you clarify? Just curious.

You are right. They don't depend on each other.

The intention behind introducing CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE
was to have a standard place where the subsystems could acquire/release
the "cpu hotplug protection" mutex in the cpu_hotplug callback function.

The same can be acheived by acquiring these mutexes in
CPU_UP_PREPARE/CPU_DOWN_PREPARE etc.

This is true for every subsystem that is cpu-hotplug aware.

> Oleg.
>

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"