2014-02-10 08:23:22

by Liu, Chuansheng

[permalink] [raw]
Subject: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

There is below race between irq handler and irq thread:
irq handler irq thread

irq_wake_thread() irq_thread()
set bit RUNTHREAD
... clear bit RUNTHREAD
thread_fn()
[A]test_and_decrease
thread_active
[B]increase thread_active

If action A is before action B, after that the thread_active
will be always > 0, and for synchronize_irq() calling, which
will be waiting there forever.

Here put the increasing thread-active before setting bit
RUNTHREAD, which can resolve such race.

Signed-off-by: xiaoming wang <[email protected]>
Signed-off-by: Chuansheng Liu <[email protected]>
---
kernel/irq/handle.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 131ca17..5f9fbb7 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -65,7 +65,7 @@ static void irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
* Wake up the handler thread for this action. If the
* RUNTHREAD bit is already set, nothing to do.
*/
- if (test_and_set_bit(IRQTF_RUNTHREAD, &action->thread_flags))
+ if (test_bit(IRQTF_RUNTHREAD, &action->thread_flags))
return;

/*
@@ -126,6 +126,25 @@ static void irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
*/
atomic_inc(&desc->threads_active);

+ /*
+ * set the RUNTHREAD bit after increasing the threads_active,
+ * it can avoid the below race:
+ * irq handler irq thread in case it is in
+ * running state
+ *
+ * set RUNTHREAD bit
+ * clear the RUNTHREAD bit
+ *... thread_fn()
+ *
+ * due to threads_active==0,
+ * no waking up wait_for_threads
+ *
+ * threads_active ++
+ * After that, the threads_active will be always > 0, which
+ * will block the synchronize_irq().
+ */
+ set_bit(IRQTF_RUNTHREAD, &action->thread_flags);
+
wake_up_process(action->thread);
}

--
1.9.rc0


2014-02-10 08:23:24

by Liu, Chuansheng

[permalink] [raw]
Subject: [PATCH 2/2] genirq: Fix one typo chasnge

Change the comment "chasnge" to "change".

Signed-off-by: Chuansheng Liu <[email protected]>
---
kernel/irq/manage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 481a13c..4802295 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -727,7 +727,7 @@ out_unlock:

#ifdef CONFIG_SMP
/*
- * Check whether we need to chasnge the affinity of the interrupt thread.
+ * Check whether we need to change the affinity of the interrupt thread.
*/
static void
irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action)
--
1.9.rc0

2014-02-10 08:58:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

On Mon, 10 Feb 2014, Chuansheng Liu wrote:
> There is below race between irq handler and irq thread:
> irq handler irq thread
>
> irq_wake_thread() irq_thread()
> set bit RUNTHREAD
> ... clear bit RUNTHREAD
> thread_fn()
> [A]test_and_decrease
> thread_active
> [B]increase thread_active
>
> If action A is before action B, after that the thread_active
> will be always > 0, and for synchronize_irq() calling, which
> will be waiting there forever.

No. thread_active is 0, simply because after the atomic_dec_and_test()
it is -1 and the atomic_inc on the other side will bring it back to 0.

Thanks,

tglx

Subject: [tip:irq/core] genirq: Update the a comment typo

Commit-ID: b04c644e670f79417f1728e6be310cfd8e6a921b
Gitweb: http://git.kernel.org/tip/b04c644e670f79417f1728e6be310cfd8e6a921b
Author: Chuansheng Liu <[email protected]>
AuthorDate: Mon, 10 Feb 2014 16:13:57 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 19 Feb 2014 17:26:34 +0100

genirq: Update the a comment typo

Change the comment "chasnge" to "change".

Signed-off-by: Chuansheng Liu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/irq/manage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 54eb5c9..ada0c54 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -757,7 +757,7 @@ out_unlock:

#ifdef CONFIG_SMP
/*
- * Check whether we need to chasnge the affinity of the interrupt thread.
+ * Check whether we need to change the affinity of the interrupt thread.
*/
static void
irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action)

2014-02-20 00:35:12

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

Hello Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Monday, February 10, 2014 4:58 PM
> To: Liu, Chuansheng
> Cc: [email protected]; Wang, Xiaoming
> Subject: Re: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
>
> On Mon, 10 Feb 2014, Chuansheng Liu wrote:
> > There is below race between irq handler and irq thread:
> > irq handler irq thread
> >
> > irq_wake_thread() irq_thread()
> > set bit RUNTHREAD
> > ... clear bit RUNTHREAD
> > thread_fn()
> > [A]test_and_decrease
> > thread_active
> > [B]increase thread_active
> >
> > If action A is before action B, after that the thread_active
> > will be always > 0, and for synchronize_irq() calling, which
> > will be waiting there forever.
>
> No. thread_active is 0, simply because after the atomic_dec_and_test()
> it is -1 and the atomic_inc on the other side will bring it back to 0.
>
Yes, you are right. The thread_active is back to 0 at last.

The case we meet is:
1/ T1: blocking at disable_irq() -- > sync_irq() -- > wait_event()
[ 142.678681] [<c1a5b353>] schedule+0x23/0x60
[ 142.683466] [<c12b24c5>] synchronize_irq+0x75/0xb0
[ 142.688931] [<c125fad0>] ? wake_up_bit+0x30/0x30
[ 142.694201] [<c12b33ab>] disable_irq+0x1b/0x20
[ 142.699278] [<c17a79bc>] smb347_shutdown+0x2c/0x50
[ 142.704744] [<c1789f7d>] i2c_device_shutdown+0x2d/0x40
[ 142.710597] [<c1601734>] device_shutdown+0x14/0x140
[ 142.716161] [<c12535f2>] kernel_restart_prepare+0x32/0x40
[ 142.722307] [<c1253613>] kernel_restart+0x13/0x60

2/ The corresponding irq thread is at sleep state:
[ 587.552408] irq/388-SMB0349 S f1c47620 7276 119 2 0x00000000
[ 587.552439] f1d6bf20 00000046 f1c47a48 f1c47620 f1d6bec4 9e91731c 00000001 c1a5f3a5
[ 587.552468] c20469c0 00000001 c20469c0 f36559c0 f1c47620 f307bde0 c20469c0 f1d6bef0
[ 587.552497] 00000296 00000000 00000296 f1d6bef0 c1a5bfa6 f1c47620 f1d6bf14 c126e329
[ 587.552501] Call Trace:
[ 587.552519] [<c1a5f3a5>] ? sub_preempt_count+0x55/0xe0
[ 587.552535] [<c1a5bfa6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
[ 587.552548] [<c126e329>] ? set_cpus_allowed_ptr+0x59/0xe0
[ 587.552563] [<c1a5b093>] schedule+0x23/0x60
[ 587.552576] [<c12b2ae1>] irq_thread+0xa1/0x130
[ 587.552588] [<c12b27f0>] ? irq_thread_dtor+0xa0/0xa0

3/ All the cpus are in the idle task;

So we guess the thread_active is not 0 at that time, but irq thread is doing nothing at that time.
Thought for a long time, but there is no idea, and it is just hit once.

Appreciated if you have some idea, thanks.


2014-02-20 12:52:48

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

On Thu, 20 Feb 2014, Liu, Chuansheng wrote:
> Hello Thomas,
>
> > -----Original Message-----
> > From: Thomas Gleixner [mailto:[email protected]]
> > Sent: Monday, February 10, 2014 4:58 PM
> > To: Liu, Chuansheng
> > Cc: [email protected]; Wang, Xiaoming
> > Subject: Re: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
> >
> > On Mon, 10 Feb 2014, Chuansheng Liu wrote:
> > > There is below race between irq handler and irq thread:
> > > irq handler irq thread
> > >
> > > irq_wake_thread() irq_thread()
> > > set bit RUNTHREAD
> > > ... clear bit RUNTHREAD
> > > thread_fn()
> > > [A]test_and_decrease
> > > thread_active
> > > [B]increase thread_active
> > >
> > > If action A is before action B, after that the thread_active
> > > will be always > 0, and for synchronize_irq() calling, which
> > > will be waiting there forever.
> >
> > No. thread_active is 0, simply because after the atomic_dec_and_test()
> > it is -1 and the atomic_inc on the other side will bring it back to 0.
> >
> Yes, you are right. The thread_active is back to 0 at last.
>
> The case we meet is:
> 1/ T1: blocking at disable_irq() -- > sync_irq() -- > wait_event()
> [ 142.678681] [<c1a5b353>] schedule+0x23/0x60
> [ 142.683466] [<c12b24c5>] synchronize_irq+0x75/0xb0
> [ 142.688931] [<c125fad0>] ? wake_up_bit+0x30/0x30
> [ 142.694201] [<c12b33ab>] disable_irq+0x1b/0x20
> [ 142.699278] [<c17a79bc>] smb347_shutdown+0x2c/0x50
> [ 142.704744] [<c1789f7d>] i2c_device_shutdown+0x2d/0x40
> [ 142.710597] [<c1601734>] device_shutdown+0x14/0x140
> [ 142.716161] [<c12535f2>] kernel_restart_prepare+0x32/0x40
> [ 142.722307] [<c1253613>] kernel_restart+0x13/0x60
>
> 2/ The corresponding irq thread is at sleep state:
> [ 587.552408] irq/388-SMB0349 S f1c47620 7276 119 2 0x00000000
> [ 587.552439] f1d6bf20 00000046 f1c47a48 f1c47620 f1d6bec4 9e91731c 00000001 c1a5f3a5
> [ 587.552468] c20469c0 00000001 c20469c0 f36559c0 f1c47620 f307bde0 c20469c0 f1d6bef0
> [ 587.552497] 00000296 00000000 00000296 f1d6bef0 c1a5bfa6 f1c47620 f1d6bf14 c126e329
> [ 587.552501] Call Trace:
> [ 587.552519] [<c1a5f3a5>] ? sub_preempt_count+0x55/0xe0
> [ 587.552535] [<c1a5bfa6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
> [ 587.552548] [<c126e329>] ? set_cpus_allowed_ptr+0x59/0xe0
> [ 587.552563] [<c1a5b093>] schedule+0x23/0x60
> [ 587.552576] [<c12b2ae1>] irq_thread+0xa1/0x130
> [ 587.552588] [<c12b27f0>] ? irq_thread_dtor+0xa0/0xa0
>
> 3/ All the cpus are in the idle task;

Lets look at it again:

CPU 0 CPU1

irq handler irq thread
set IRQS_INPROGRESS
...
irq_wake_thread() irq_thread()
set bit RUNTHREAD
... clear bit RUNTHREAD
thread_fn()
atomic_dec_and_test(threads_active) ( 0 -> -1)

atomic_inc(threads_active) ( -1 -> 0)
clr IRQS_INPROGRESS

Now synchronize_irq comes into play, that's what caused you to look
into this.

synchronize_irq() can never observe the -1 state because it is
serialized against IRQS_INPROGESS. And when IRQS_INPROGRESS is
cleared, the threads_active state is back to 0.

I'm really not seing how this can happen. Any chance you can reproduce
this by executing the situation which led to this in a loop?

Thanks,

tglx

2014-02-21 00:53:32

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

Hello Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Thursday, February 20, 2014 8:53 PM
> To: Liu, Chuansheng
> Cc: [email protected]; Wang, Xiaoming
> Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
>
> On Thu, 20 Feb 2014, Liu, Chuansheng wrote:
> > Hello Thomas,
> >
> > > -----Original Message-----
> > > From: Thomas Gleixner [mailto:[email protected]]
> > > Sent: Monday, February 10, 2014 4:58 PM
> > > To: Liu, Chuansheng
> > > Cc: [email protected]; Wang, Xiaoming
> > > Subject: Re: [PATCH 1/2] genirq: Fix the possible synchronize_irq()
> wait-forever
> > >
> > > On Mon, 10 Feb 2014, Chuansheng Liu wrote:
> > > > There is below race between irq handler and irq thread:
> > > > irq handler irq thread
> > > >
> > > > irq_wake_thread() irq_thread()
> > > > set bit RUNTHREAD
> > > > ... clear bit RUNTHREAD
> > > > thread_fn()
> > > > [A]test_and_decrease
> > > > thread_active
> > > > [B]increase thread_active
> > > >
> > > > If action A is before action B, after that the thread_active
> > > > will be always > 0, and for synchronize_irq() calling, which
> > > > will be waiting there forever.
> > >
> > > No. thread_active is 0, simply because after the atomic_dec_and_test()
> > > it is -1 and the atomic_inc on the other side will bring it back to 0.
> > >
> > Yes, you are right. The thread_active is back to 0 at last.
> >
> > The case we meet is:
> > 1/ T1: blocking at disable_irq() -- > sync_irq() -- > wait_event()
> > [ 142.678681] [<c1a5b353>] schedule+0x23/0x60
> > [ 142.683466] [<c12b24c5>] synchronize_irq+0x75/0xb0
> > [ 142.688931] [<c125fad0>] ? wake_up_bit+0x30/0x30
> > [ 142.694201] [<c12b33ab>] disable_irq+0x1b/0x20
> > [ 142.699278] [<c17a79bc>] smb347_shutdown+0x2c/0x50
> > [ 142.704744] [<c1789f7d>] i2c_device_shutdown+0x2d/0x40
> > [ 142.710597] [<c1601734>] device_shutdown+0x14/0x140
> > [ 142.716161] [<c12535f2>] kernel_restart_prepare+0x32/0x40
> > [ 142.722307] [<c1253613>] kernel_restart+0x13/0x60
> >
> > 2/ The corresponding irq thread is at sleep state:
> > [ 587.552408] irq/388-SMB0349 S f1c47620 7276 119 2
> 0x00000000
> > [ 587.552439] f1d6bf20 00000046 f1c47a48 f1c47620 f1d6bec4 9e91731c
> 00000001 c1a5f3a5
> > [ 587.552468] c20469c0 00000001 c20469c0 f36559c0 f1c47620 f307bde0
> c20469c0 f1d6bef0
> > [ 587.552497] 00000296 00000000 00000296 f1d6bef0 c1a5bfa6
> f1c47620 f1d6bf14 c126e329
> > [ 587.552501] Call Trace:
> > [ 587.552519] [<c1a5f3a5>] ? sub_preempt_count+0x55/0xe0
> > [ 587.552535] [<c1a5bfa6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
> > [ 587.552548] [<c126e329>] ? set_cpus_allowed_ptr+0x59/0xe0
> > [ 587.552563] [<c1a5b093>] schedule+0x23/0x60
> > [ 587.552576] [<c12b2ae1>] irq_thread+0xa1/0x130
> > [ 587.552588] [<c12b27f0>] ? irq_thread_dtor+0xa0/0xa0
> >
> > 3/ All the cpus are in the idle task;
>
> Lets look at it again:
>
> CPU 0 CPU1
>
> irq handler irq thread
> set IRQS_INPROGRESS
> ...
> irq_wake_thread() irq_thread()
> set bit RUNTHREAD
> ... clear bit RUNTHREAD
> thread_fn()
> atomic_dec_and_test(threads_active) ( 0 -> -1)
>
> atomic_inc(threads_active) ( -1 -> 0)
> clr IRQS_INPROGRESS
>
> Now synchronize_irq comes into play, that's what caused you to look
> into this.
>
> synchronize_irq() can never observe the -1 state because it is
> serialized against IRQS_INPROGESS. And when IRQS_INPROGRESS is
> cleared, the threads_active state is back to 0.
>
> I'm really not seing how this can happen. Any chance you can reproduce
> this by executing the situation which led to this in a loop?
We can have a try to forcedly reproduce the case of threads_active -1/0.

But feels there is another case which the synchronize_irq waited there forever,
it is no waking up action from irq_thread().

CPU0 CPU1
disable_irq() irq_thread()
synchronize_irq()
wait_event()
adding the __wait into the queue wake_threads_waitq
test threads_active==0
atomic_dec_and_test(threads_active) 1 -- > 0
waitqueue_active(&desc->wait_for_threads)
<== Here without smp_mb(), CPU1 maybe detect
the queue is still empty??
schedule()

It will cause although the threads_active is 0, but irq_thread() didn't do the waking up action.
Is it reasonable? Then maybe we can add one smp_mb() before waitqueue_active.

Thanks.







2014-02-21 10:33:46

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> But feels there is another case which the synchronize_irq waited there forever,
> it is no waking up action from irq_thread().
>
> CPU0 CPU1
> disable_irq() irq_thread()
> synchronize_irq()
> wait_event()
> adding the __wait into the queue wake_threads_waitq
> test threads_active==0
> atomic_dec_and_test(threads_active) 1 -- > 0
> waitqueue_active(&desc->wait_for_threads)
> <== Here without smp_mb(), CPU1 maybe detect
> the queue is still empty??
> schedule()
>
> It will cause although the threads_active is 0, but irq_thread() didn't do the waking up action.
> Is it reasonable? Then maybe we can add one smp_mb() before waitqueue_active.

I think you have a point there, but not on x86 wherre the atomic_dec
and the spinlock on the queueing side are full barriers. For non-x86
there is definitely a potential issue.

Thanks,

tglx


2014-02-21 11:01:55

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

Hello Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Friday, February 21, 2014 6:34 PM
> To: Liu, Chuansheng
> Cc: [email protected]; Wang, Xiaoming
> Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
>
> On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > But feels there is another case which the synchronize_irq waited there
> forever,
> > it is no waking up action from irq_thread().
> >
> > CPU0 CPU1
> > disable_irq() irq_thread()
> > synchronize_irq()
> > wait_event()
> > adding the __wait into the queue wake_threads_waitq
> > test threads_active==0
> > atomic_dec_and_test(threads_active) 1 -- > 0
> >
> waitqueue_active(&desc->wait_for_threads)
> > <== Here without smp_mb(), CPU1
> maybe detect
> > the queue is still empty??
> > schedule()
> >
> > It will cause although the threads_active is 0, but irq_thread() didn't do the
> waking up action.
> > Is it reasonable? Then maybe we can add one smp_mb() before
> waitqueue_active.
>
> I think you have a point there, but not on x86 wherre the atomic_dec
> and the spinlock on the queueing side are full barriers. For non-x86
> there is definitely a potential issue.
>
But even on X86, spin_unlock has no full barrier, the following scenario:
CPU0 CPU1
spin_lock
atomic_dec_and_test
insert into queue
spin_unlock
checking waitqueue_active

Here after inserting into the queue, before waitqueue_active,
there is no mb.

So is it still the case? Thanks.

2014-02-21 11:10:28

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> Hello Thomas,
>
> > -----Original Message-----
> > From: Thomas Gleixner [mailto:[email protected]]
> > Sent: Friday, February 21, 2014 6:34 PM
> > To: Liu, Chuansheng
> > Cc: [email protected]; Wang, Xiaoming
> > Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
> >
> > On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > > But feels there is another case which the synchronize_irq waited there
> > forever,
> > > it is no waking up action from irq_thread().
> > >
> > > CPU0 CPU1
> > > disable_irq() irq_thread()
> > > synchronize_irq()
> > > wait_event()
> > > adding the __wait into the queue wake_threads_waitq
> > > test threads_active==0
> > > atomic_dec_and_test(threads_active) 1 -- > 0
> > >
> > waitqueue_active(&desc->wait_for_threads)
> > > <== Here without smp_mb(), CPU1
> > maybe detect
> > > the queue is still empty??
> > > schedule()
> > >
> > > It will cause although the threads_active is 0, but irq_thread() didn't do the
> > waking up action.
> > > Is it reasonable? Then maybe we can add one smp_mb() before
> > waitqueue_active.
> >
> > I think you have a point there, but not on x86 wherre the atomic_dec
> > and the spinlock on the queueing side are full barriers. For non-x86
> > there is definitely a potential issue.
> >
> But even on X86, spin_unlock has no full barrier, the following scenario:
> CPU0 CPU1
> spin_lock
> atomic_dec_and_test
> insert into queue
> spin_unlock
> checking waitqueue_active

But CPU0 sees the 0, right?

Thanks,

tglx

2014-02-21 11:26:14

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever



> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Friday, February 21, 2014 7:11 PM
> To: Liu, Chuansheng
> Cc: [email protected]; Wang, Xiaoming
> Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
>
> On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > Hello Thomas,
> >
> > > -----Original Message-----
> > > From: Thomas Gleixner [mailto:[email protected]]
> > > Sent: Friday, February 21, 2014 6:34 PM
> > > To: Liu, Chuansheng
> > > Cc: [email protected]; Wang, Xiaoming
> > > Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq()
> wait-forever
> > >
> > > On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > > > But feels there is another case which the synchronize_irq waited there
> > > forever,
> > > > it is no waking up action from irq_thread().
> > > >
> > > > CPU0 CPU1
> > > > disable_irq() irq_thread()
> > > > synchronize_irq()
> > > > wait_event()
> > > > adding the __wait into the queue wake_threads_waitq
> > > > test threads_active==0
> > > > atomic_dec_and_test(threads_active) 1 -- > 0
> > > >
> > > waitqueue_active(&desc->wait_for_threads)
> > > > <== Here without smp_mb(),
> CPU1
> > > maybe detect
> > > > the queue is still empty??
> > > > schedule()
> > > >
> > > > It will cause although the threads_active is 0, but irq_thread() didn't do
> the
> > > waking up action.
> > > > Is it reasonable? Then maybe we can add one smp_mb() before
> > > waitqueue_active.
> > >
> > > I think you have a point there, but not on x86 wherre the atomic_dec
> > > and the spinlock on the queueing side are full barriers. For non-x86
> > > there is definitely a potential issue.
> > >
> > But even on X86, spin_unlock has no full barrier, the following scenario:
> > CPU0 CPU1
> > spin_lock
> > atomic_dec_and_test
> > insert into queue
> > spin_unlock
> > checking waitqueue_active
>
> But CPU0 sees the 0, right?
Not be clear here:)
The atomic_read has no barrier.

Found commit 6cb2a21049b89 has one similar smp_mb() calling before
waitqueue_active() on one X86 CPU.


2014-02-21 11:52:50

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > > > I think you have a point there, but not on x86 wherre the atomic_dec
> > > > and the spinlock on the queueing side are full barriers. For non-x86
> > > > there is definitely a potential issue.
> > > >
> > > But even on X86, spin_unlock has no full barrier, the following scenario:
> > > CPU0 CPU1
> > > spin_lock
> > > atomic_dec_and_test
> > > insert into queue
> > > spin_unlock
> > > checking waitqueue_active
> >
> > But CPU0 sees the 0, right?
> Not be clear here:)
> The atomic_read has no barrier.
>
> Found commit 6cb2a21049b89 has one similar smp_mb() calling before
> waitqueue_active() on one X86 CPU.

Indeed, you are completely right. Great detective work!

I'm inclined to remove the waitqueue_active() alltogether. It's
creating more headache than it's worth.

Thanks,

tglx

2014-02-21 12:29:55

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

Hello Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:[email protected]]
> Sent: Friday, February 21, 2014 7:53 PM
> To: Liu, Chuansheng
> Cc: [email protected]; Wang, Xiaoming
> Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
>
> On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > > > > I think you have a point there, but not on x86 wherre the atomic_dec
> > > > > and the spinlock on the queueing side are full barriers. For non-x86
> > > > > there is definitely a potential issue.
> > > > >
> > > > But even on X86, spin_unlock has no full barrier, the following scenario:
> > > > CPU0 CPU1
> > > > spin_lock
> > > > atomic_dec_and_test
> > > > insert into queue
> > > > spin_unlock
> > > > checking waitqueue_active
> > >
> > > But CPU0 sees the 0, right?
> > Not be clear here:)
> > The atomic_read has no barrier.
> >
> > Found commit 6cb2a21049b89 has one similar smp_mb() calling before
> > waitqueue_active() on one X86 CPU.
>
> Indeed, you are completely right. Great detective work!
Thanks your encouraging.

>
> I'm inclined to remove the waitqueue_active() alltogether. It's
> creating more headache than it's worth.
If I am understanding well, removing the checking of waitqueue_active(),
and call wakeup() directly which will check list with spinlock protection.

If so, I can prepare one patch for it:)

2014-02-21 13:05:54

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever

On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > -----Original Message-----
> > From: Thomas Gleixner [mailto:[email protected]]
> > Sent: Friday, February 21, 2014 7:53 PM
> > To: Liu, Chuansheng
> > Cc: [email protected]; Wang, Xiaoming
> > Subject: RE: [PATCH 1/2] genirq: Fix the possible synchronize_irq() wait-forever
> >
> > On Fri, 21 Feb 2014, Liu, Chuansheng wrote:
> > > > > > I think you have a point there, but not on x86 wherre the atomic_dec
> > > > > > and the spinlock on the queueing side are full barriers. For non-x86
> > > > > > there is definitely a potential issue.
> > > > > >
> > > > > But even on X86, spin_unlock has no full barrier, the following scenario:
> > > > > CPU0 CPU1
> > > > > spin_lock
> > > > > atomic_dec_and_test
> > > > > insert into queue
> > > > > spin_unlock
> > > > > checking waitqueue_active
> > > >
> > > > But CPU0 sees the 0, right?
> > > Not be clear here:)
> > > The atomic_read has no barrier.
> > >
> > > Found commit 6cb2a21049b89 has one similar smp_mb() calling before
> > > waitqueue_active() on one X86 CPU.
> >
> > Indeed, you are completely right. Great detective work!
> Thanks your encouraging.
>
> >
> > I'm inclined to remove the waitqueue_active() alltogether. It's
> > creating more headache than it's worth.
> If I am understanding well, removing the checking of waitqueue_active(),
> and call wakeup() directly which will check list with spinlock protection.

Correct.

> If so, I can prepare one patch for it:)

Appreciated.

tglx