Release reader lock on r/w sem before stopping khidp thread (which needs to
claim the writer lock on sem before unlinking the session).
NB: kthread_stop waits for thread completion.
---
net/bluetooth/hidp/core.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index c405a95..16d75d7 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -1096,6 +1096,7 @@ int hidp_del_connection(struct hidp_conndel_req *req)
{
struct hidp_session *session;
int err = 0;
+ bool stop_thread = false;
BT_DBG("");
@@ -1111,12 +1112,14 @@ int hidp_del_connection(struct hidp_conndel_req *req)
skb_queue_purge(&session->ctrl_transmit);
skb_queue_purge(&session->intr_transmit);
- kthread_stop(session->task);
+ stop_thread = true;
}
} else
err = -ENOENT;
up_read(&hidp_session_sem);
+ if (stop_thread)
+ kthread_stop(session->task);
return err;
}
--
1.7.4.1
* Ilia, Kolominsky <[email protected]> [2011-06-30 16:55:52 +0200]:
> > -----Original Message-----
> > From: Peter Hurley [mailto:[email protected]]
> > Sent: Thursday, June 30, 2011 5:35 PM
> > To: Gustavo F. Padovan
> > Cc: Ilia, Kolominsky; linux-bluetooth
> > Subject: Re: [PATCH] Bluetooth: Fix hidp disconnect deadlock
> >
> > On Wed, 2011-06-29 at 16:52 -0400, Gustavo F. Padovan wrote:
> > > * Gustavo F. Padovan <[email protected]> [2011-06-29 17:24:56 -
> > 0300]:
> > >
> > > > * Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
> > > >
> > > > > Hi!
> > > > > IMHO the fix isnt good due to possible race condition which
> > > > > will destroy session/task objects - either by a call to
> > kthread_stop
> > > > > from the timer func or reentry to hidp_del_connection() on
> > > > > smp platforms.
> > ....
> > > This should fix the timer issue. Please test.
> > >
> > > Gustavo
> >
> > Hi Ilia & Gustavo,
> >
> > After Ilia pointed out the problem with the timer function, I went back
> > and reviewed *all* the synchronization code relevant to the hid session
> > thread.
> >
> > A number of problems were introduced with commit aabf6f89 - when the
> > session thread was converted from a kernel_thread to a kthread.
> > Although
> > a kthread is a better choice for representing the session thread, the
> > naive conversion of atomic/wakeup to kthread_stop() was inappropriate.
> >
> > kthread_stop() has usage semantics that different significantly from
> > atomic/wakeup. As we already know, because kthread_stop() blocks on
> > thread completion, it can introduce deadlocks in code that already uses
> > exclusion mechanisms. Even with Ilia's new patch, consider the
> > following
> > sequence:
> >
> > Thread 0 Thread 1 Thread 2
> > in hidp_del_connection in hidp session
> > claim r/w sem .
> > timer triggers .
> > kthread_stop() --------->.
> > *blocks on thread 2* exits loop
> > *blocks for r/w
>
> Guys, as far as I know - it is not possible to use kthread_stop from
> The timer func - since it may sleep and we are in soft_irq...
> What I did is added kthread_stop_async in the kthread use it in both
> Timer func and hidp_del_connection(). I am still polishing the code
> though...
Yes, you right, we really can't call kthread_stop there.
> Gustavo, is adding the new api to kthread feasible?
> How to upstream such change?
I don't know the kthread internals to tell you if this is feasible or not. You
have to send the patch to the maintainer of kthread
(./scripts/get_maintainer.pl tells you the maintainer) and explain why you
need this to him.
But we also need a solution in the short term, this week, in order to fix this
in the up coming 3.0 release.
Gustavo
* Peter Hurley <[email protected]> [2011-06-30 10:34:35 -0400]:
> On Wed, 2011-06-29 at 16:52 -0400, Gustavo F. Padovan wrote:
> > * Gustavo F. Padovan <[email protected]> [2011-06-29 17:24:56 -0300]:
> >
> > > * Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
> > >
> > > > Hi!
> > > > IMHO the fix isnt good due to possible race condition which
> > > > will destroy session/task objects - either by a call to kthread_stop
> > > > from the timer func or reentry to hidp_del_connection() on
> > > > smp platforms.
> ....
> > This should fix the timer issue. Please test.
> >
> > Gustavo
>
> Hi Ilia & Gustavo,
>
> After Ilia pointed out the problem with the timer function, I went back
> and reviewed *all* the synchronization code relevant to the hid session
> thread.
>
> A number of problems were introduced with commit aabf6f89 - when the
> session thread was converted from a kernel_thread to a kthread. Although
> a kthread is a better choice for representing the session thread, the
> naive conversion of atomic/wakeup to kthread_stop() was inappropriate.
>
> kthread_stop() has usage semantics that different significantly from
> atomic/wakeup. As we already know, because kthread_stop() blocks on
> thread completion, it can introduce deadlocks in code that already uses
> exclusion mechanisms. Even with Ilia's new patch, consider the following
> sequence:
>
> Thread 0 Thread 1 Thread 2
> in hidp_del_connection in hidp session
> claim r/w sem .
> timer triggers .
> kthread_stop() --------->.
> *blocks on thread 2* exits loop
> *blocks for r/w sem*
> in hidp_del_timer
> del_timer_sync()
> *blocks on thread 1*
>
> Deadlock occurs because:
> + thread 0 holds reader lock but is waiting for the timer function on
> thread 1 to finish
> + thread 1 has stopped kthread and is waiting for thread completion
> + thread 2 (aka kthread) is waiting to claim writer lock held by thread
> 0
>
> In addition to the deadlocks and races, kthread_stop() is being called
> by hidp_process_hid_control() which is *in the session thread context* -
> kthread_stop() cannot be called on itself!
>
> I've been testing patch v2 since Monday which continues to use kthread
> but reverts back to the old behavior of atomic/wakeup. It also fixes the
> potential for a lost wakeup. Of course, "testing" means running it and
> looking at it carefully - as someone on IRC pointed out, kthread really
> needs to get instrumented with lockdep.
Where is this v2 patch? I wanna take a look on it.
Gustavo
> -----Original Message-----
> From: Peter Hurley [mailto:[email protected]]
> Sent: Thursday, June 30, 2011 5:35 PM
> To: Gustavo F. Padovan
> Cc: Ilia, Kolominsky; linux-bluetooth
> Subject: Re: [PATCH] Bluetooth: Fix hidp disconnect deadlock
>
> On Wed, 2011-06-29 at 16:52 -0400, Gustavo F. Padovan wrote:
> > * Gustavo F. Padovan <[email protected]> [2011-06-29 17:24:56 -
> 0300]:
> >
> > > * Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
> > >
> > > > Hi!
> > > > IMHO the fix isnt good due to possible race condition which
> > > > will destroy session/task objects - either by a call to
> kthread_stop
> > > > from the timer func or reentry to hidp_del_connection() on
> > > > smp platforms.
> ....
> > This should fix the timer issue. Please test.
> >
> > Gustavo
>
> Hi Ilia & Gustavo,
>
> After Ilia pointed out the problem with the timer function, I went back
> and reviewed *all* the synchronization code relevant to the hid session
> thread.
>
> A number of problems were introduced with commit aabf6f89 - when the
> session thread was converted from a kernel_thread to a kthread.
> Although
> a kthread is a better choice for representing the session thread, the
> naive conversion of atomic/wakeup to kthread_stop() was inappropriate.
>
> kthread_stop() has usage semantics that different significantly from
> atomic/wakeup. As we already know, because kthread_stop() blocks on
> thread completion, it can introduce deadlocks in code that already uses
> exclusion mechanisms. Even with Ilia's new patch, consider the
> following
> sequence:
>
> Thread 0 Thread 1 Thread 2
> in hidp_del_connection in hidp session
> claim r/w sem .
> timer triggers .
> kthread_stop() --------->.
> *blocks on thread 2* exits loop
> *blocks for r/w
Guys, as far as I know - it is not possible to use kthread_stop from
The timer func - since it may sleep and we are in soft_irq...
What I did is added kthread_stop_async in the kthread use it in both
Timer func and hidp_del_connection(). I am still polishing the code
though...
Gustavo, is adding the new api to kthread feasible?
How to upstream such change?
> sem*
> in hidp_del_timer
> del_timer_sync()
> *blocks on thread 1*
>
> Deadlock occurs because:
> + thread 0 holds reader lock but is waiting for the timer function on
> thread 1 to finish
> + thread 1 has stopped kthread and is waiting for thread completion
> + thread 2 (aka kthread) is waiting to claim writer lock held by thread
> 0
>
> In addition to the deadlocks and races, kthread_stop() is being called
> by hidp_process_hid_control() which is *in the session thread context*
> -
> kthread_stop() cannot be called on itself!
>
> I've been testing patch v2 since Monday which continues to use kthread
> but reverts back to the old behavior of atomic/wakeup. It also fixes
> the
> potential for a lost wakeup. Of course, "testing" means running it and
> looking at it carefully - as someone on IRC pointed out, kthread really
> needs to get instrumented with lockdep.
>
> Regards,
> Peter Hurley
On Thursday, June 30, 2011 10:41:51 AM Peter Hurley did opine:
> On Wed, 2011-06-29 at 16:52 -0400, Gustavo F. Padovan wrote:
> > * Gustavo F. Padovan <[email protected]> [2011-06-29 17:24:56
-0300]:
> > > * Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
> > > > Hi!
> > > > IMHO the fix isnt good due to possible race condition which
> > > > will destroy session/task objects - either by a call to
> > > > kthread_stop from the timer func or reentry to
> > > > hidp_del_connection() on smp platforms.
>
> ....
>
> > This should fix the timer issue. Please test.
> >
> > Gustavo
>
> Hi Ilia & Gustavo,
>
> After Ilia pointed out the problem with the timer function, I went back
> and reviewed *all* the synchronization code relevant to the hid session
> thread.
>
> A number of problems were introduced with commit aabf6f89 - when the
> session thread was converted from a kernel_thread to a kthread. Although
> a kthread is a better choice for representing the session thread, the
> naive conversion of atomic/wakeup to kthread_stop() was inappropriate.
>
> kthread_stop() has usage semantics that different significantly from
> atomic/wakeup. As we already know, because kthread_stop() blocks on
> thread completion, it can introduce deadlocks in code that already uses
> exclusion mechanisms. Even with Ilia's new patch, consider the following
> sequence:
>
> Thread 0 Thread 1 Thread 2
> in hidp_del_connection in hidp session
> claim r/w sem .
> timer triggers .
> kthread_stop() --------->.
> *blocks on thread 2* exits loop
> *blocks for r/w sem*
> in hidp_del_timer
> del_timer_sync()
> *blocks on thread 1*
>
> Deadlock occurs because:
> + thread 0 holds reader lock but is waiting for the timer function on
> thread 1 to finish
> + thread 1 has stopped kthread and is waiting for thread completion
> + thread 2 (aka kthread) is waiting to claim writer lock held by thread
> 0
>
> In addition to the deadlocks and races, kthread_stop() is being called
> by hidp_process_hid_control() which is *in the session thread context* -
> kthread_stop() cannot be called on itself!
>
> I've been testing patch v2 since Monday which continues to use kthread
> but reverts back to the old behavior of atomic/wakeup. It also fixes the
> potential for a lost wakeup. Of course, "testing" means running it and
> looking at it carefully - as someone on IRC pointed out, kthread really
> needs to get instrumented with lockdep.
>
> Regards,
> Peter Hurley
Peter; You may have hit on a problem I'm seeing here also, to wit:
I can reboot, start bluetoothd by hand, and using blueman-manager make a
connection that works well despite the signal being reported as a bit less
than perfect since the two devices are a couple of walls and 20 feet apart.
It will work for 12 to 18 hours, then die 'deadlocked' and cannot be
restored without a reboot.
Seems to me we at least have similar ducks, they both waddle. ;-)
Cheers, gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
When God endowed human beings with brains, He did not intend to guarantee
them.
On Wed, 2011-06-29 at 16:52 -0400, Gustavo F. Padovan wrote:
> * Gustavo F. Padovan <[email protected]> [2011-06-29 17:24:56 -0300]:
>
> > * Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
> >
> > > Hi!
> > > IMHO the fix isnt good due to possible race condition which
> > > will destroy session/task objects - either by a call to kthread_stop
> > > from the timer func or reentry to hidp_del_connection() on
> > > smp platforms.
....
> This should fix the timer issue. Please test.
>
> Gustavo
Hi Ilia & Gustavo,
After Ilia pointed out the problem with the timer function, I went back
and reviewed *all* the synchronization code relevant to the hid session
thread.
A number of problems were introduced with commit aabf6f89 - when the
session thread was converted from a kernel_thread to a kthread. Although
a kthread is a better choice for representing the session thread, the
naive conversion of atomic/wakeup to kthread_stop() was inappropriate.
kthread_stop() has usage semantics that different significantly from
atomic/wakeup. As we already know, because kthread_stop() blocks on
thread completion, it can introduce deadlocks in code that already uses
exclusion mechanisms. Even with Ilia's new patch, consider the following
sequence:
Thread 0 Thread 1 Thread 2
in hidp_del_connection in hidp session
claim r/w sem .
timer triggers .
kthread_stop() --------->.
*blocks on thread 2* exits loop
*blocks for r/w sem*
in hidp_del_timer
del_timer_sync()
*blocks on thread 1*
Deadlock occurs because:
+ thread 0 holds reader lock but is waiting for the timer function on
thread 1 to finish
+ thread 1 has stopped kthread and is waiting for thread completion
+ thread 2 (aka kthread) is waiting to claim writer lock held by thread
0
In addition to the deadlocks and races, kthread_stop() is being called
by hidp_process_hid_control() which is *in the session thread context* -
kthread_stop() cannot be called on itself!
I've been testing patch v2 since Monday which continues to use kthread
but reverts back to the old behavior of atomic/wakeup. It also fixes the
potential for a lost wakeup. Of course, "testing" means running it and
looking at it carefully - as someone on IRC pointed out, kthread really
needs to get instrumented with lockdep.
Regards,
Peter Hurley
* Gustavo F. Padovan <[email protected]> [2011-06-29 17:24:56 -0300]:
> * Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
>
> > Hi!
> > IMHO the fix isnt good due to possible race condition which
> > will destroy session/task objects - either by a call to kthread_stop
> > from the timer func or reentry to hidp_del_connection() on
> > smp platforms.
> > see the intopic comments
> >
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:linux-bluetooth-
> > > [email protected]] On Behalf Of Peter Hurley
> > > Sent: Sunday, June 26, 2011 12:33 AM
> > > To: linux-bluetooth
> > > Subject: [PATCH] Bluetooth: Fix hidp disconnect deadlock
> > >
> > > Release reader lock on r/w sem before stopping khidp thread (which
> > > needs to
> > > claim the writer lock on sem before unlinking the session).
> > >
> > > NB: kthread_stop waits for thread completion.
> > > ---
> > > net/bluetooth/hidp/core.c | 5 ++++-
> > > 1 files changed, 4 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
> > > index c405a95..16d75d7 100644
> > > --- a/net/bluetooth/hidp/core.c
> > > +++ b/net/bluetooth/hidp/core.c
> > > @@ -1096,6 +1096,7 @@ int hidp_del_connection(struct hidp_conndel_req
> > > *req)
> > > {
> > > struct hidp_session *session;
> > > int err = 0;
> > > + bool stop_thread = false;
> > >
> > > BT_DBG("");
> > >
> > > @@ -1111,12 +1112,14 @@ int hidp_del_connection(struct hidp_conndel_req
> > > *req)
> > > skb_queue_purge(&session->ctrl_transmit);
> > > skb_queue_purge(&session->intr_transmit);
> > >
> > > - kthread_stop(session->task);
> > > + stop_thread = true;
> > > }
> > > } else
> > > err = -ENOENT;
> > >
> > > up_read(&hidp_session_sem);
> > > + if (stop_thread)
> >
> > Timer fires here - session is destroyed.
> >
> > > + kthread_stop(session->task);
> > > return err;
> > > }
> > >
> > > --
> > > 1.7.4.1
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> > > bluetooth" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > I am working on a better solution which will ensure the async deletion
> > of session kthread operates correctly from different exec. contexts
>
> Are you going to deliver this soon? This needs to be fixed on 3.0.
This should fix the timer issue. Please test.
Gustavo
Author: Gustavo F. Padovan <[email protected]>
Date: Wed Jun 29 17:45:56 2011 -0300
Bluetooth: Delete timer before call kthread_stop()
If we keep the timer running we can have a race condition between the
timer and hidp_del_connection. By deleting the time before call
kthread_stop() we won't have any race condition.
Signed-off-by: Gustavo F. Padovan <[email protected]>
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 16d75d7..a86ba69 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -476,7 +476,7 @@ static void hidp_set_timer(struct hidp_session *session)
static inline void hidp_del_timer(struct hidp_session *session)
{
if (session->idle_to > 0)
- del_timer(&session->timer);
+ del_timer_sync(&session->timer);
}
static void hidp_process_handshake(struct hidp_session *session,
@@ -1113,6 +1113,8 @@ int hidp_del_connection(struct hidp_conndel_req *req)
skb_queue_purge(&session->intr_transmit);
stop_thread = true;
+
+ hidp_del_timer(session);
}
} else
err = -ENOENT;
~
* Ilia, Kolominsky <[email protected]> [2011-06-26 09:16:58 +0200]:
> Hi!
> IMHO the fix isnt good due to possible race condition which
> will destroy session/task objects - either by a call to kthread_stop
> from the timer func or reentry to hidp_del_connection() on
> smp platforms.
> see the intopic comments
>
>
> > -----Original Message-----
> > From: [email protected] [mailto:linux-bluetooth-
> > [email protected]] On Behalf Of Peter Hurley
> > Sent: Sunday, June 26, 2011 12:33 AM
> > To: linux-bluetooth
> > Subject: [PATCH] Bluetooth: Fix hidp disconnect deadlock
> >
> > Release reader lock on r/w sem before stopping khidp thread (which
> > needs to
> > claim the writer lock on sem before unlinking the session).
> >
> > NB: kthread_stop waits for thread completion.
> > ---
> > net/bluetooth/hidp/core.c | 5 ++++-
> > 1 files changed, 4 insertions(+), 1 deletions(-)
> >
> > diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
> > index c405a95..16d75d7 100644
> > --- a/net/bluetooth/hidp/core.c
> > +++ b/net/bluetooth/hidp/core.c
> > @@ -1096,6 +1096,7 @@ int hidp_del_connection(struct hidp_conndel_req
> > *req)
> > {
> > struct hidp_session *session;
> > int err = 0;
> > + bool stop_thread = false;
> >
> > BT_DBG("");
> >
> > @@ -1111,12 +1112,14 @@ int hidp_del_connection(struct hidp_conndel_req
> > *req)
> > skb_queue_purge(&session->ctrl_transmit);
> > skb_queue_purge(&session->intr_transmit);
> >
> > - kthread_stop(session->task);
> > + stop_thread = true;
> > }
> > } else
> > err = -ENOENT;
> >
> > up_read(&hidp_session_sem);
> > + if (stop_thread)
>
> Timer fires here - session is destroyed.
>
> > + kthread_stop(session->task);
> > return err;
> > }
> >
> > --
> > 1.7.4.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > bluetooth" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> I am working on a better solution which will ensure the async deletion
> of session kthread operates correctly from different exec. contexts
Are you going to deliver this soon? This needs to be fixed on 3.0.
Gustavo
Hi!
IMHO the fix isnt good due to possible race condition which
will destroy session/task objects - either by a call to kthread_stop
from the timer func or reentry to hidp_del_connection() on
smp platforms.
see the intopic comments
> -----Original Message-----
> From: [email protected] [mailto:linux-bluetooth-
> [email protected]] On Behalf Of Peter Hurley
> Sent: Sunday, June 26, 2011 12:33 AM
> To: linux-bluetooth
> Subject: [PATCH] Bluetooth: Fix hidp disconnect deadlock
>
> Release reader lock on r/w sem before stopping khidp thread (which
> needs to
> claim the writer lock on sem before unlinking the session).
>
> NB: kthread_stop waits for thread completion.
> ---
> net/bluetooth/hidp/core.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
> index c405a95..16d75d7 100644
> --- a/net/bluetooth/hidp/core.c
> +++ b/net/bluetooth/hidp/core.c
> @@ -1096,6 +1096,7 @@ int hidp_del_connection(struct hidp_conndel_req
> *req)
> {
> struct hidp_session *session;
> int err = 0;
> + bool stop_thread = false;
>
> BT_DBG("");
>
> @@ -1111,12 +1112,14 @@ int hidp_del_connection(struct hidp_conndel_req
> *req)
> skb_queue_purge(&session->ctrl_transmit);
> skb_queue_purge(&session->intr_transmit);
>
> - kthread_stop(session->task);
> + stop_thread = true;
> }
> } else
> err = -ENOENT;
>
> up_read(&hidp_session_sem);
> + if (stop_thread)
Timer fires here - session is destroyed.
> + kthread_stop(session->task);
> return err;
> }
>
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> bluetooth" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I am working on a better solution which will ensure the async deletion
of session kthread operates correctly from different exec. contexts
Regards
Ilia Kolominsky
[email protected]
Direct: +972(9)7906231
Mobile: +972(54)909009