2014-10-02 11:09:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Tue, Sep 30, 2014 at 04:02:28PM +0800, Fengguang Wu wrote:
> Hi Peter,
>
> We possibly find a rfcomm bug (maintainers CCed) exposed by your debug patch
>
> [ 1.861895] NET: Registered protocol family 5
> [ 1.862978] Bluetooth: RFCOMM TTY layer initialized
> [ 1.863099] ------------[ cut here ]------------
> [ 1.863105] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep+0x17d/0x1a1()
> [ 1.863112] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c14dc381>] rfcomm_run+0xdf/0x130e
> [ 1.863591] [<c1058b73>] ? kthread_stop+0x53/0x53
> [ 1.864906] [<c155a411>] dump_stack+0x48/0x60
> [ 1.866298] [<c14dc381>] ? rfcomm_run+0xdf/0x130e

Ha yes, rfcomm_run is a complete buggy mess indeed. Lemme go see what I
can make of it.


2014-10-02 12:31:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 01:09:27PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 30, 2014 at 04:02:28PM +0800, Fengguang Wu wrote:
> > Hi Peter,
> >
> > We possibly find a rfcomm bug (maintainers CCed) exposed by your debug patch
> >
> > [ 1.861895] NET: Registered protocol family 5
> > [ 1.862978] Bluetooth: RFCOMM TTY layer initialized
> > [ 1.863099] ------------[ cut here ]------------
> > [ 1.863105] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep+0x17d/0x1a1()
> > [ 1.863112] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c14dc381>] rfcomm_run+0xdf/0x130e
> > [ 1.863591] [<c1058b73>] ? kthread_stop+0x53/0x53
> > [ 1.864906] [<c155a411>] dump_stack+0x48/0x60
> > [ 1.866298] [<c14dc381>] ? rfcomm_run+0xdf/0x130e
>
> Ha yes, rfcomm_run is a complete buggy mess indeed. Lemme go see what I
> can make of it.

---
Subject: rfcomm: Fix broken wait construct

rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.

While at it, rename rfcomm_schedule to rfcomm_wake, since that is what
it actually does.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
net/bluetooth/rfcomm/core.c | 46 +++++++++++++++++++++-----------------------
1 file changed, 22 insertions(+), 24 deletions(-)

--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -101,11 +101,11 @@ static struct rfcomm_session *rfcomm_ses
#define __get_rpn_stop_bits(line) (((line) >> 2) & 0x1)
#define __get_rpn_parity(line) (((line) >> 3) & 0x7)

-static void rfcomm_schedule(void)
+static DECLARE_WAIT_QUEUE_HEAD(rfcomm_wq);
+
+static void rfcomm_wake(void)
{
- if (!rfcomm_thread)
- return;
- wake_up_process(rfcomm_thread);
+ wake_up_all(&rfcomm_wq);
}

/* ---- RFCOMM FCS computation ---- */
@@ -183,13 +183,13 @@ static inline int __check_fcs(u8 *data,
static void rfcomm_l2state_change(struct sock *sk)
{
BT_DBG("%p state %d", sk, sk->sk_state);
- rfcomm_schedule();
+ rfcomm_wake();
}

static void rfcomm_l2data_ready(struct sock *sk)
{
BT_DBG("%p", sk);
- rfcomm_schedule();
+ rfcomm_wake();
}

static int rfcomm_l2sock_create(struct socket **sock)
@@ -238,7 +238,7 @@ static void rfcomm_session_timeout(unsig
BT_DBG("session %p state %ld", s, s->state);

set_bit(RFCOMM_TIMED_OUT, &s->flags);
- rfcomm_schedule();
+ rfcomm_wake();
}

static void rfcomm_session_set_timer(struct rfcomm_session *s, long timeout)
@@ -264,7 +264,7 @@ static void rfcomm_dlc_timeout(unsigned

set_bit(RFCOMM_TIMED_OUT, &d->flags);
rfcomm_dlc_put(d);
- rfcomm_schedule();
+ rfcomm_wake();
}

static void rfcomm_dlc_set_timer(struct rfcomm_dlc *d, long timeout)
@@ -462,7 +462,7 @@ static int __rfcomm_dlc_close(struct rfc
case BT_CONNECT2:
if (test_and_clear_bit(RFCOMM_DEFER_SETUP, &d->flags)) {
set_bit(RFCOMM_AUTH_REJECT, &d->flags);
- rfcomm_schedule();
+ rfcomm_wake();
return 0;
}
}
@@ -566,7 +566,7 @@ int rfcomm_dlc_send(struct rfcomm_dlc *d
skb_queue_tail(&d->tx_queue, skb);

if (!test_bit(RFCOMM_TX_THROTTLED, &d->flags))
- rfcomm_schedule();
+ rfcomm_wake();
return len;
}

@@ -581,7 +581,7 @@ void rfcomm_dlc_send_noerror(struct rfco

if (d->state == BT_CONNECTED &&
!test_bit(RFCOMM_TX_THROTTLED, &d->flags))
- rfcomm_schedule();
+ rfcomm_wake();
}

void __rfcomm_dlc_throttle(struct rfcomm_dlc *d)
@@ -592,7 +592,7 @@ void __rfcomm_dlc_throttle(struct rfcomm
d->v24_sig |= RFCOMM_V24_FC;
set_bit(RFCOMM_MSC_PENDING, &d->flags);
}
- rfcomm_schedule();
+ rfcomm_wake();
}

void __rfcomm_dlc_unthrottle(struct rfcomm_dlc *d)
@@ -603,7 +603,7 @@ void __rfcomm_dlc_unthrottle(struct rfco
d->v24_sig &= ~RFCOMM_V24_FC;
set_bit(RFCOMM_MSC_PENDING, &d->flags);
}
- rfcomm_schedule();
+ rfcomm_wake();
}

/*
@@ -624,7 +624,7 @@ int rfcomm_dlc_set_modem_status(struct r
d->v24_sig = v24_sig;

if (!test_and_set_bit(RFCOMM_MSC_PENDING, &d->flags))
- rfcomm_schedule();
+ rfcomm_wake();

return 0;
}
@@ -872,7 +872,7 @@ static int rfcomm_queue_disc(struct rfco
cmd->fcs = __fcs2((u8 *) cmd);

skb_queue_tail(&d->tx_queue, skb);
- rfcomm_schedule();
+ rfcomm_wake();
return 0;
}

@@ -1952,7 +1952,7 @@ static void rfcomm_accept_connection(str
s->mtu = min(l2cap_pi(nsock->sk)->chan->omtu,
l2cap_pi(nsock->sk)->chan->imtu) - 5;

- rfcomm_schedule();
+ rfcomm_wake();
} else
sock_release(nsock);
}
@@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)

static int rfcomm_run(void *unused)
{
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
BT_DBG("");

set_user_nice(current, -10);

rfcomm_add_listener(BDADDR_ANY);

- while (1) {
- set_current_state(TASK_INTERRUPTIBLE);
-
- if (kthread_should_stop())
- break;
+ add_wait_queue(&rfcomm_wq, &wait);
+ while (!kthread_should_stop()) {

/* Process stuff */
rfcomm_process_sessions();

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- __set_current_state(TASK_RUNNING);
+ remove_wait_queue(&rfcomm_wq, &wait);

rfcomm_kill_listener();

@@ -2154,7 +2152,7 @@ static void rfcomm_security_cfm(struct h
set_bit(RFCOMM_AUTH_REJECT, &d->flags);
}

- rfcomm_schedule();
+ rfcomm_wake();
}

static struct hci_cb rfcomm_cb = {

2014-10-02 12:38:54

by Peter Hurley

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02/2014 08:31 AM, Peter Zijlstra wrote:
> On Thu, Oct 02, 2014 at 01:09:27PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 30, 2014 at 04:02:28PM +0800, Fengguang Wu wrote:
>>> Hi Peter,
>>>
>>> We possibly find a rfcomm bug (maintainers CCed) exposed by your debug patch
>>>
>>> [ 1.861895] NET: Registered protocol family 5
>>> [ 1.862978] Bluetooth: RFCOMM TTY layer initialized
>>> [ 1.863099] ------------[ cut here ]------------
>>> [ 1.863105] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep+0x17d/0x1a1()
>>> [ 1.863112] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c14dc381>] rfcomm_run+0xdf/0x130e
>>> [ 1.863591] [<c1058b73>] ? kthread_stop+0x53/0x53
>>> [ 1.864906] [<c155a411>] dump_stack+0x48/0x60
>>> [ 1.866298] [<c14dc381>] ? rfcomm_run+0xdf/0x130e
>>
>> Ha yes, rfcomm_run is a complete buggy mess indeed. Lemme go see what I
>> can make of it.
>
> ---
> Subject: rfcomm: Fix broken wait construct
>
> rfcomm_run() is a tad broken in that is has a nested wait loop. One
> cannot rely on p->state for the outer wait because the inner wait will
> overwrite it.
>
> While at it, rename rfcomm_schedule to rfcomm_wake, since that is what
> it actually does.

rfcomm_schedule() as in schedule_work(), which is how it's used.

Regards,
Peter Hurley

2014-10-02 12:42:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 02:31:50PM +0200, Peter Zijlstra wrote:
> @@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)
>
> static int rfcomm_run(void *unused)
> {
> + DEFINE_WAIT_FUNC(wait, woken_wake_function);
> BT_DBG("");
>
> set_user_nice(current, -10);
>
> rfcomm_add_listener(BDADDR_ANY);
>
> - while (1) {
> - set_current_state(TASK_INTERRUPTIBLE);
> -
> - if (kthread_should_stop())
> - break;
> + add_wait_queue(&rfcomm_wq, &wait);
> + while (!kthread_should_stop()) {
>
> /* Process stuff */
> rfcomm_process_sessions();
>
> - schedule();
> + wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
> }
> - __set_current_state(TASK_RUNNING);
> + remove_wait_queue(&rfcomm_wq, &wait);
>
> rfcomm_kill_listener();
>

Hmm, I think there's a problem there. If someone were to do
kthread_stop() before wait_woken() we'd not actually stop, because
wait_woken() doesn't test KTHREAD_SHOULD_STOP before calling schedule().

We can't unconditionally put a kthread_should_stop() in because
to_kthread() would explode on a !kthread. The other obvious solution is
adding a second function, something like wait_woken_or_stop(), but that
appears somewhat ugly to me.

Oleg, do you see another solution?

2014-10-02 12:54:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 08:38:46AM -0400, Peter Hurley wrote:
> On 10/02/2014 08:31 AM, Peter Zijlstra wrote:
> > On Thu, Oct 02, 2014 at 01:09:27PM +0200, Peter Zijlstra wrote:
> >> On Tue, Sep 30, 2014 at 04:02:28PM +0800, Fengguang Wu wrote:
> >>> Hi Peter,
> >>>
> >>> We possibly find a rfcomm bug (maintainers CCed) exposed by your debug patch
> >>>
> >>> [ 1.861895] NET: Registered protocol family 5
> >>> [ 1.862978] Bluetooth: RFCOMM TTY layer initialized
> >>> [ 1.863099] ------------[ cut here ]------------
> >>> [ 1.863105] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep+0x17d/0x1a1()
> >>> [ 1.863112] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c14dc381>] rfcomm_run+0xdf/0x130e
> >>> [ 1.863591] [<c1058b73>] ? kthread_stop+0x53/0x53
> >>> [ 1.864906] [<c155a411>] dump_stack+0x48/0x60
> >>> [ 1.866298] [<c14dc381>] ? rfcomm_run+0xdf/0x130e
> >>
> >> Ha yes, rfcomm_run is a complete buggy mess indeed. Lemme go see what I
> >> can make of it.
> >
> > ---
> > Subject: rfcomm: Fix broken wait construct
> >
> > rfcomm_run() is a tad broken in that is has a nested wait loop. One
> > cannot rely on p->state for the outer wait because the inner wait will
> > overwrite it.
> >
> > While at it, rename rfcomm_schedule to rfcomm_wake, since that is what
> > it actually does.
>
> rfcomm_schedule() as in schedule_work(), which is how it's used.

Not really, all it does is wake the rfcomm_thread. The thread then does
a linear walk of all known sessions looking for work -- which is clearly
suboptimal as well, but I didn't feel like fixing that.

Also, the current implementation already disagrees with you, all it
basically does it call wake_up_process() which is a big clue right
there.

2014-10-02 13:05:46

by Peter Hurley

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02/2014 08:54 AM, Peter Zijlstra wrote:
> On Thu, Oct 02, 2014 at 08:38:46AM -0400, Peter Hurley wrote:
>> On 10/02/2014 08:31 AM, Peter Zijlstra wrote:
>>> On Thu, Oct 02, 2014 at 01:09:27PM +0200, Peter Zijlstra wrote:
>>>> On Tue, Sep 30, 2014 at 04:02:28PM +0800, Fengguang Wu wrote:
>>>>> Hi Peter,
>>>>>
>>>>> We possibly find a rfcomm bug (maintainers CCed) exposed by your debug patch
>>>>>
>>>>> [ 1.861895] NET: Registered protocol family 5
>>>>> [ 1.862978] Bluetooth: RFCOMM TTY layer initialized
>>>>> [ 1.863099] ------------[ cut here ]------------
>>>>> [ 1.863105] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep+0x17d/0x1a1()
>>>>> [ 1.863112] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c14dc381>] rfcomm_run+0xdf/0x130e
>>>>> [ 1.863591] [<c1058b73>] ? kthread_stop+0x53/0x53
>>>>> [ 1.864906] [<c155a411>] dump_stack+0x48/0x60
>>>>> [ 1.866298] [<c14dc381>] ? rfcomm_run+0xdf/0x130e
>>>>
>>>> Ha yes, rfcomm_run is a complete buggy mess indeed. Lemme go see what I
>>>> can make of it.
>>>
>>> ---
>>> Subject: rfcomm: Fix broken wait construct
>>>
>>> rfcomm_run() is a tad broken in that is has a nested wait loop. One
>>> cannot rely on p->state for the outer wait because the inner wait will
>>> overwrite it.
>>>
>>> While at it, rename rfcomm_schedule to rfcomm_wake, since that is what
>>> it actually does.
>>
>> rfcomm_schedule() as in schedule_work(), which is how it's used.
>
> Not really, all it does is wake the rfcomm_thread. The thread then does
> a linear walk of all known sessions looking for work -- which is clearly
> suboptimal as well, but I didn't feel like fixing that.
>
> Also, the current implementation already disagrees with you, all it
> basically does it call wake_up_process() which is a big clue right
> there.

You're thinking of it from the point of view of the scheduler, so to you
it should be named what it does.

However, from the users' point of view, it's an abstraction of work
dispatching; the fact that a kthread (which needs waking) does the work
is irrelevant.

Consider if the kthread is converted to work_structs instead and your now-
renamed rfcomm_wake() is calling schedule_work().

Regards,
Peter Hurley

2014-10-02 13:42:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 09:05:42AM -0400, Peter Hurley wrote:

> >>> While at it, rename rfcomm_schedule to rfcomm_wake, since that is what
> >>> it actually does.
> >>
> >> rfcomm_schedule() as in schedule_work(), which is how it's used.
> >
> > Not really, all it does is wake the rfcomm_thread. The thread then does
> > a linear walk of all known sessions looking for work -- which is clearly
> > suboptimal as well, but I didn't feel like fixing that.
> >
> > Also, the current implementation already disagrees with you, all it
> > basically does it call wake_up_process() which is a big clue right
> > there.
>
> You're thinking of it from the point of view of the scheduler, so to you
> it should be named what it does.

Of course I am, that thing is called 'schedule' so its natural to think
about the scheduler :-)

> However, from the users' point of view, it's an abstraction of work
> dispatching; the fact that a kthread (which needs waking) does the work
> is irrelevant.

Still a misnomer, see below.

> Consider if the kthread is converted to work_structs instead and your now-
> renamed rfcomm_wake() is calling schedule_work().

Then it would probably be less buggy and more efficient -- where I'm
assuming it would queue work per session and avoid the endless scanning
of sessions.

Also schedule_work() is somewhat sanely named in that you schedule the
work for later execution, so here we can use the term. The thread
however might already be scheduled or even running, so there it is not
appropriate.

2014-10-02 13:49:12

by Peter Hurley

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02/2014 08:42 AM, Peter Zijlstra wrote:
> On Thu, Oct 02, 2014 at 02:31:50PM +0200, Peter Zijlstra wrote:
>> @@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)
>>
>> static int rfcomm_run(void *unused)
>> {
>> + DEFINE_WAIT_FUNC(wait, woken_wake_function);
>> BT_DBG("");
>>
>> set_user_nice(current, -10);
>>
>> rfcomm_add_listener(BDADDR_ANY);
>>
>> - while (1) {
>> - set_current_state(TASK_INTERRUPTIBLE);
>> -
>> - if (kthread_should_stop())
>> - break;
>> + add_wait_queue(&rfcomm_wq, &wait);
>> + while (!kthread_should_stop()) {
>>
>> /* Process stuff */
>> rfcomm_process_sessions();
>>
>> - schedule();
>> + wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
>> }
>> - __set_current_state(TASK_RUNNING);
>> + remove_wait_queue(&rfcomm_wq, &wait);
>>
>> rfcomm_kill_listener();
>>
>
> Hmm, I think there's a problem there. If someone were to do
> kthread_stop() before wait_woken() we'd not actually stop, because
> wait_woken() doesn't test KTHREAD_SHOULD_STOP before calling schedule().

Do you mean this situation?

CPU 0 | CPU 1
|
rfcomm_run() | kthread_stop()
... |
if (!test_bit(KTHREAD_SHOULD_STOP)) |
| set_bit(KTHREAD_SHOULD_STOP)
| wake_up_process()
wait_woken() | wait_for_completion()
set_current_state(INTERRUPTIBLE) |
if (!WQ_FLAG_WOKEN) |
schedule_timeout() |
|

Now both tasks are sleeping forever.

If yes, then wakeups from signals don't work either, right?

Regards,
Peter Hurley

2014-10-02 13:52:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 09:49:04AM -0400, Peter Hurley wrote:
> On 10/02/2014 08:42 AM, Peter Zijlstra wrote:
> > On Thu, Oct 02, 2014 at 02:31:50PM +0200, Peter Zijlstra wrote:
> >> @@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)
> >>
> >> static int rfcomm_run(void *unused)
> >> {
> >> + DEFINE_WAIT_FUNC(wait, woken_wake_function);
> >> BT_DBG("");
> >>
> >> set_user_nice(current, -10);
> >>
> >> rfcomm_add_listener(BDADDR_ANY);
> >>
> >> - while (1) {
> >> - set_current_state(TASK_INTERRUPTIBLE);
> >> -
> >> - if (kthread_should_stop())
> >> - break;
> >> + add_wait_queue(&rfcomm_wq, &wait);
> >> + while (!kthread_should_stop()) {
> >>
> >> /* Process stuff */
> >> rfcomm_process_sessions();
> >>
> >> - schedule();
> >> + wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
> >> }
> >> - __set_current_state(TASK_RUNNING);
> >> + remove_wait_queue(&rfcomm_wq, &wait);
> >>
> >> rfcomm_kill_listener();
> >>
> >
> > Hmm, I think there's a problem there. If someone were to do
> > kthread_stop() before wait_woken() we'd not actually stop, because
> > wait_woken() doesn't test KTHREAD_SHOULD_STOP before calling schedule().
>
> Do you mean this situation?
>
> CPU 0 | CPU 1
> |
> rfcomm_run() | kthread_stop()
> ... |
> if (!test_bit(KTHREAD_SHOULD_STOP)) |
> | set_bit(KTHREAD_SHOULD_STOP)
> | wake_up_process()
> wait_woken() | wait_for_completion()
> set_current_state(INTERRUPTIBLE) |
> if (!WQ_FLAG_WOKEN) |
> schedule_timeout() |
> |
>
> Now both tasks are sleeping forever.

Yep.

> If yes, then wakeups from signals don't work either, right?

Its a kthread, there should not be any signals.

2014-10-02 13:58:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 03:52:50PM +0200, Peter Zijlstra wrote:
> > If yes, then wakeups from signals don't work either, right?
>
> Its a kthread, there should not be any signals.

That said, in the tty patch we do appear to have this problem.

Oleg, do we want something like the below on top to make that work
again?

---
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -326,8 +326,10 @@ long wait_woken(wait_queue_t *wait, unsi
* woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
* also observe all state before the wakeup.
*/
- if (!(wait->flags & WQ_FLAG_WOKEN))
- timeout = schedule_timeout(timeout);
+ if (!(wait->flags & WQ_FLAG_WOKEN)) {
+ if (___wait_is_interruptible(mode) && !signal_pending_state(mode, current))
+ timeout = schedule_timeout(timeout);
+ }
__set_current_state(TASK_RUNNING);

/*

2014-10-02 14:16:34

by Peter Hurley

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02/2014 09:58 AM, Peter Zijlstra wrote:
> On Thu, Oct 02, 2014 at 03:52:50PM +0200, Peter Zijlstra wrote:
>>> If yes, then wakeups from signals don't work either, right?
>>
>> Its a kthread, there should not be any signals.
>
> That said, in the tty patch we do appear to have this problem.

That's what I meant. And the module load patch too.

> Oleg, do we want something like the below on top to make that work
> again?
>
> ---
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -326,8 +326,10 @@ long wait_woken(wait_queue_t *wait, unsi
> * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
> * also observe all state before the wakeup.
> */
> - if (!(wait->flags & WQ_FLAG_WOKEN))
> - timeout = schedule_timeout(timeout);
> + if (!(wait->flags & WQ_FLAG_WOKEN)) {
> + if (___wait_is_interruptible(mode) && !signal_pending_state(mode, current))
> + timeout = schedule_timeout(timeout);
> + }
> __set_current_state(TASK_RUNNING);
>
> /*
>

2014-10-02 16:57:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 10:16:27AM -0400, Peter Hurley wrote:
> That's what I meant. And the module load patch too.

Ah, my bad. I thought you were talking about the rfcomm thing.

In any case, if we change wait_woken() like the below, then we can
simplify the loops by taking out their signal_pending checks and using
the wait_woken() return value instead.

---
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -326,8 +326,14 @@ long wait_woken(wait_queue_t *wait, unsi
* woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
* also observe all state before the wakeup.
*/
- if (!(wait->flags & WQ_FLAG_WOKEN))
- timeout = schedule_timeout(timeout);
+ if (!(wait->flags & WQ_FLAG_WOKEN)) {
+ if (___wait_is_interruptible(mode)) {
+ if (signal_pending_state(mode, current))
+ timeout = -ERESTARTSYS;
+ else
+ timeout = schedule_timeout(timeout);
+ }
+ }
__set_current_state(TASK_RUNNING);

/*

2014-10-02 19:14:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02, Peter Zijlstra wrote:
>
> On Thu, Oct 02, 2014 at 03:52:50PM +0200, Peter Zijlstra wrote:
> > > If yes, then wakeups from signals don't work either, right?
> >
> > Its a kthread, there should not be any signals.
>
> That said, in the tty patch we do appear to have this problem.
>
> Oleg, do we want something like the below on top to make that work
> again?
>
> ---
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -326,8 +326,10 @@ long wait_woken(wait_queue_t *wait, unsi
> * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
> * also observe all state before the wakeup.
> */
> - if (!(wait->flags & WQ_FLAG_WOKEN))
> - timeout = schedule_timeout(timeout);
> + if (!(wait->flags & WQ_FLAG_WOKEN)) {
> + if (___wait_is_interruptible(mode) && !signal_pending_state(mode, current))
> + timeout = schedule_timeout(timeout);
> + }
> __set_current_state(TASK_RUNNING);

I am a bit confused... but for what?

schedule() won't sleep if signal_pending_state(mode) anyway, so we
do not need this correctness-wise. And the caller needs to check
signal_pending() anyway.

We can probably add

if (signal_pending_state(mode, current))
return -EINTR;

at the start of wait_woken(), even before set_current_state(mode).
Then the caller can check "ret < 0" and avoid signal_pending().
Not sure this makes sense.

Oleg.

2014-10-02 19:22:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

Ah, somehow I missed this email, I already replied to the previous one.

On 10/02, Peter Zijlstra wrote:
>
> In any case, if we change wait_woken() like the below, then we can
> simplify the loops by taking out their signal_pending checks and using
> the wait_woken() return value instead.

Yes, but let me repeat,

> +++ b/kernel/sched/wait.c
> @@ -326,8 +326,14 @@ long wait_woken(wait_queue_t *wait, unsi
> * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
> * also observe all state before the wakeup.
> */
> - if (!(wait->flags & WQ_FLAG_WOKEN))
> - timeout = schedule_timeout(timeout);
> + if (!(wait->flags & WQ_FLAG_WOKEN)) {
> + if (___wait_is_interruptible(mode)) {

___wait_is_interruptible() is pointless, signal_pending_state() does
the same checks. Not to mention it will always return T in this case,
note that __builtin_constant_p(state) == F.


> + if (signal_pending_state(mode, current))
> + timeout = -ERESTARTSYS;

OK, but unless I missed something this looks overcomplicated. You can
simply do this at the start of wait_woken(). Not need to play with
current->state, no need to clear WQ_FLAG_WOKEN.

Oleg.

2014-10-02 19:49:15

by Peter Hurley

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02/2014 03:11 PM, Oleg Nesterov wrote:
> On 10/02, Peter Zijlstra wrote:
>>
>> On Thu, Oct 02, 2014 at 03:52:50PM +0200, Peter Zijlstra wrote:
>>>> If yes, then wakeups from signals don't work either, right?
>>>
>>> Its a kthread, there should not be any signals.
>>
>> That said, in the tty patch we do appear to have this problem.
>>
>> Oleg, do we want something like the below on top to make that work
>> again?
>>
>> ---
>> --- a/kernel/sched/wait.c
>> +++ b/kernel/sched/wait.c
>> @@ -326,8 +326,10 @@ long wait_woken(wait_queue_t *wait, unsi
>> * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
>> * also observe all state before the wakeup.
>> */
>> - if (!(wait->flags & WQ_FLAG_WOKEN))
>> - timeout = schedule_timeout(timeout);
>> + if (!(wait->flags & WQ_FLAG_WOKEN)) {
>> + if (___wait_is_interruptible(mode) && !signal_pending_state(mode, current))
>> + timeout = schedule_timeout(timeout);
>> + }
>> __set_current_state(TASK_RUNNING);
>
> I am a bit confused... but for what?
>
> schedule() won't sleep if signal_pending_state(mode) anyway, so we
> do not need this correctness-wise. And the caller needs to check
> signal_pending() anyway.
>
> We can probably add
>
> if (signal_pending_state(mode, current))
> return -EINTR;
>
> at the start of wait_woken(), even before set_current_state(mode).
> Then the caller can check "ret < 0" and avoid signal_pending().
> Not sure this makes sense.

The confusion is my fault; I see now that signals don't suffer from the
missed wakeup problem to which other condition testing is prone. Thanks
for setting me straight, Oleg.

So just back to the kthread wakeup problem then.

Regards,
Peter Hurley


2014-10-02 19:57:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 09:11:14PM +0200, Oleg Nesterov wrote:
> On 10/02, Peter Zijlstra wrote:
> >
> > On Thu, Oct 02, 2014 at 03:52:50PM +0200, Peter Zijlstra wrote:
> > > > If yes, then wakeups from signals don't work either, right?
> > >
> > > Its a kthread, there should not be any signals.
> >
> > That said, in the tty patch we do appear to have this problem.
> >
> > Oleg, do we want something like the below on top to make that work
> > again?
> >
> > ---
> > --- a/kernel/sched/wait.c
> > +++ b/kernel/sched/wait.c
> > @@ -326,8 +326,10 @@ long wait_woken(wait_queue_t *wait, unsi
> > * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
> > * also observe all state before the wakeup.
> > */
> > - if (!(wait->flags & WQ_FLAG_WOKEN))
> > - timeout = schedule_timeout(timeout);
> > + if (!(wait->flags & WQ_FLAG_WOKEN)) {
> > + if (___wait_is_interruptible(mode) && !signal_pending_state(mode, current))
> > + timeout = schedule_timeout(timeout);
> > + }
> > __set_current_state(TASK_RUNNING);
>
> I am a bit confused... but for what?
>
> schedule() won't sleep if signal_pending_state(mode) anyway, so we
> do not need this correctness-wise. And the caller needs to check
> signal_pending() anyway.

Urgh, I always forget how all that signal stuff works. Yes you're right,
we check that right in __schedule().

I'll just make all what I did go away and we'll keep it simple like it
was. Sorry for the confusion.

2014-10-02 20:13:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/02, Peter Zijlstra wrote:
>
> On Thu, Oct 02, 2014 at 02:31:50PM +0200, Peter Zijlstra wrote:
> > @@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)
> >
> > static int rfcomm_run(void *unused)
> > {
> > + DEFINE_WAIT_FUNC(wait, woken_wake_function);
> > BT_DBG("");
> >
> > set_user_nice(current, -10);
> >
> > rfcomm_add_listener(BDADDR_ANY);
> >
> > - while (1) {
> > - set_current_state(TASK_INTERRUPTIBLE);
> > -
> > - if (kthread_should_stop())
> > - break;
> > + add_wait_queue(&rfcomm_wq, &wait);
> > + while (!kthread_should_stop()) {
> >
> > /* Process stuff */
> > rfcomm_process_sessions();
> >
> > - schedule();
> > + wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
> > }
> > - __set_current_state(TASK_RUNNING);
> > + remove_wait_queue(&rfcomm_wq, &wait);
> >
> > rfcomm_kill_listener();
> >
>
> Hmm, I think there's a problem there. If someone were to do
> kthread_stop() before wait_woken() we'd not actually stop, because
> wait_woken() doesn't test KTHREAD_SHOULD_STOP before calling schedule().
>
> We can't unconditionally put a kthread_should_stop() in because
> to_kthread() would explode on a !kthread. The other obvious solution is
> adding a second function, something like wait_woken_or_stop(), but that
> appears somewhat ugly to me.
>
> Oleg, do you see another solution?

You know, I already thought about the patch below for other reasons, it
can probably simplify other users of kthread_should_stop(). Because this
way we can rely on the signal checks in schedule(). (Just in case, the
patch is not complete, see TODO).

As for rfcomm_run(), perhaps it can ise it too?

set_kthread_wants_signal(true);

add_wait_queue(&rfcomm_wq, &wait);
for (;;) {
// This is only possible if kthread_should_stop() == T
if (signal_pending(current))
break;

rfcomm_process_sessions();
wait_woken(TASK_INTERRUPTIBLE);
}

Of course, this assumes that rfcomm_process_sessions() can't do something
"really bad" if signal_pending() is true.

What do you think?

Oleg.

--- x/kernel/kthread.c
+++ x/kernel/kthread.c
@@ -49,6 +49,7 @@ struct kthread {
enum KTHREAD_BITS {
KTHREAD_IS_PER_CPU = 0,
KTHREAD_SHOULD_STOP,
+ KTHREAD_WANTS_SIGNAL,
KTHREAD_SHOULD_PARK,
KTHREAD_IS_PARKED,
};
@@ -442,6 +443,21 @@ int kthread_park(struct task_struct *k)
return ret;
}

+void set_kthread_wants_signal(bool on)
+{
+ unsigned long *kflags = &to_kthread(current)->flags;
+ unsigned long irqflags;
+
+ if (on) {
+ set_bit(KTHREAD_WANTS_SIGNAL, kflags);
+ } else {
+ spin_lock_irqsave(&current->sighand->siglock, irqflags);
+ clear_bit(KTHREAD_WANTS_SIGNAL, kflags);
+ recalc_sigpending();
+ spin_unlock_irqrestore(&current->sighand->siglock, irqflags);
+ }
+}
+
/**
* kthread_stop - stop a thread created by kthread_create().
* @k: thread created by kthread_create().
@@ -469,6 +485,9 @@ int kthread_stop(struct task_struct *k)
if (kthread) {
set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
__kthread_unpark(k, kthread);
+ // TODO: this is racy, we need ->siglock.
+ if (test_bit(KTHREAD_WANTS_SIGNAL, &to_kthread(k)->flags))
+ set_tsk_thread_flag(k, TIF_SIGPENDING);
wake_up_process(k);
wait_for_completion(&kthread->exited);
}

2014-10-03 11:50:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Thu, Oct 02, 2014 at 10:10:20PM +0200, Oleg Nesterov wrote:
> You know, I already thought about the patch below for other reasons, it
> can probably simplify other users of kthread_should_stop(). Because this
> way we can rely on the signal checks in schedule(). (Just in case, the
> patch is not complete, see TODO).
>
> As for rfcomm_run(), perhaps it can ise it too?
>
> set_kthread_wants_signal(true);
>
> add_wait_queue(&rfcomm_wq, &wait);
> for (;;) {
> // This is only possible if kthread_should_stop() == T

True because kthreads SIG_IGN everything, right?

> if (signal_pending(current))
> break;
>
> rfcomm_process_sessions();
> wait_woken(TASK_INTERRUPTIBLE);
> }
>
> Of course, this assumes that rfcomm_process_sessions() can't do something
> "really bad" if signal_pending() is true.

So from what I can think of, everything that does an INTERRUPTIBLE sleep
will 'malfunction' after that, right? Which might be quite a lot
actually.

> What do you think?

Interesting approach, but somewhat risky I tihnk, due to that
INTERRUPTIBLE thing.

> --- x/kernel/kthread.c
> +++ x/kernel/kthread.c
> @@ -49,6 +49,7 @@ struct kthread {
> enum KTHREAD_BITS {
> KTHREAD_IS_PER_CPU = 0,
> KTHREAD_SHOULD_STOP,
> + KTHREAD_WANTS_SIGNAL,
> KTHREAD_SHOULD_PARK,
> KTHREAD_IS_PARKED,
> };
> @@ -442,6 +443,21 @@ int kthread_park(struct task_struct *k)
> return ret;
> }
>
> +void set_kthread_wants_signal(bool on)
> +{
> + unsigned long *kflags = &to_kthread(current)->flags;
> + unsigned long irqflags;
> +
> + if (on) {
> + set_bit(KTHREAD_WANTS_SIGNAL, kflags);
> + } else {
> + spin_lock_irqsave(&current->sighand->siglock, irqflags);
> + clear_bit(KTHREAD_WANTS_SIGNAL, kflags);
> + recalc_sigpending();
> + spin_unlock_irqrestore(&current->sighand->siglock, irqflags);
> + }
> +}
> +
> /**
> * kthread_stop - stop a thread created by kthread_create().
> * @k: thread created by kthread_create().
> @@ -469,6 +485,9 @@ int kthread_stop(struct task_struct *k)
> if (kthread) {
> set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
> __kthread_unpark(k, kthread);
> + // TODO: this is racy, we need ->siglock.
> + if (test_bit(KTHREAD_WANTS_SIGNAL, &to_kthread(k)->flags))
> + set_tsk_thread_flag(k, TIF_SIGPENDING);

Right, but taking that should not really be a problem afaict, this is a
slow path if ever there was one.

> wake_up_process(k);
> wait_for_completion(&kthread->exited);
> }
>

2014-10-03 18:00:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/03, Peter Zijlstra wrote:
>
> On Thu, Oct 02, 2014 at 10:10:20PM +0200, Oleg Nesterov wrote:
>
> > As for rfcomm_run(), perhaps it can ise it too?
> >
> > set_kthread_wants_signal(true);
> >
> > add_wait_queue(&rfcomm_wq, &wait);
> > for (;;) {
> > // This is only possible if kthread_should_stop() == T
>
> True because kthreads SIG_IGN everything, right?

Yes,

> > if (signal_pending(current))
> > break;
> >
> > rfcomm_process_sessions();
> > wait_woken(TASK_INTERRUPTIBLE);
> > }
> >
> > Of course, this assumes that rfcomm_process_sessions() can't do something
> > "really bad" if signal_pending() is true.
>
> So from what I can think of, everything that does an INTERRUPTIBLE sleep
> will 'malfunction' after that, right? Which might be quite a lot
> actually.

Yes.

> > What do you think?
>
> Interesting approach, but somewhat risky I tihnk, due to that
> INTERRUPTIBLE thing.

OK, this is fixable. rfcomm_run() can do

add_wait_queue(&rfcomm_wq, &wait);
while (!kthread_should_stop()) {
rfcomm_process_sessions();

set_kthread_wants_signal(true);
wait_woken(TASK_INTERRUPTIBLE);
set_kthread_wants_signal(false);
}
remove_wait_queue(&rfcomm_wq, &wait);

Or. perhaps we can change wait_woken

- set_current_state(mode);
+ if (mode)
+ set_current_state(mode);


then rfcomm_run() can do

for (;;) {
rfcomm_process_sessions();

set_current_state(TASK_INTERRUPTIBLE);
if (kthread_should_stop())
break;
wait_woken(0);
}

Or perhaps we can split wait_woken() into 2 helpers,

static inline long wait_woken(wq, mode, timeout)
{
set_current_state(mode);
schedule_woken(wq, timeout); // does the rest
}

to avoid "mode == 0" hack; rfcomm_run() should use schedule_woken().

What do you think?

Oleg.

2014-10-03 19:34:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/03, Oleg Nesterov wrote:
>
> OK, this is fixable. rfcomm_run() can do
>
> add_wait_queue(&rfcomm_wq, &wait);
> while (!kthread_should_stop()) {
> rfcomm_process_sessions();
>
> set_kthread_wants_signal(true);
> wait_woken(TASK_INTERRUPTIBLE);
> set_kthread_wants_signal(false);
> }
> remove_wait_queue(&rfcomm_wq, &wait);

And in this case set_kthread_wants_signal(true) needs to avoid the races
with kthread_stop() too. See the hopefully complete patch at the end.

However,

> Or. perhaps we can change wait_woken
>
> - set_current_state(mode);
> + if (mode)
> + set_current_state(mode);
>
>
> then rfcomm_run() can do
>
> for (;;) {
> rfcomm_process_sessions();
>
> set_current_state(TASK_INTERRUPTIBLE);
> if (kthread_should_stop())
> break;
> wait_woken(0);
> }
>
> Or perhaps we can split wait_woken() into 2 helpers,
>
> static inline long wait_woken(wq, mode, timeout)
> {
> set_current_state(mode);
> schedule_woken(wq, timeout); // does the rest
> }
>
> to avoid "mode == 0" hack; rfcomm_run() should use schedule_woken().

probably this makes more sense in this particular case...

Oleg.
---

--- x/kernel/kthread.c
+++ x/kernel/kthread.c
@@ -48,6 +48,7 @@ struct kthread {

enum KTHREAD_BITS {
KTHREAD_IS_PER_CPU = 0,
+ KTHREAD_WANTS_SIGNAL,
KTHREAD_SHOULD_STOP,
KTHREAD_SHOULD_PARK,
KTHREAD_IS_PARKED,
@@ -442,6 +443,45 @@ int kthread_park(struct task_struct *k)
return ret;
}

+void set_kthread_wants_signal(bool on)
+{
+ struct kthread *kthread = to_kthread(current);
+ unsigned long flags;
+
+ spin_lock_irqsave(&current->sighand->siglock, flags);
+ if (on) {
+ set_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags);
+ smp_mb__after_atomic();
+ if (kthread_should_stop())
+ set_thread_flag(TIF_SIGPENDING);
+ } else {
+ clear_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags);
+ recalc_sigpending();
+ }
+ spin_unlock_irqrestore(&current->sighand->siglock, flags);
+}
+
+static void kthread_kill(struct task_struct *k, struct kthread *kthread)
+{
+ smp_mb__before_atomic();
+ if (test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags)) {
+ unsigned long flags;
+ bool kill = true;
+
+ if (lock_task_sighand(k, &flags)) {
+ kill = test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags);
+ if (kill)
+ signal_wake_up(k, 0);
+ unlock_task_sighand(k, &flags);
+ }
+
+ if (kill)
+ return;
+ }
+
+ wake_up_process(k);
+}
+
/**
* kthread_stop - stop a thread created by kthread_create().
* @k: thread created by kthread_create().
@@ -469,7 +509,7 @@ int kthread_stop(struct task_struct *k)
if (kthread) {
set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
__kthread_unpark(k, kthread);
- wake_up_process(k);
+ kthread_kill(k, kthread);
wait_for_completion(&kthread->exited);
}
ret = k->exit_code;

2014-10-04 08:43:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Fri, Oct 03, 2014 at 09:30:29PM +0200, Oleg Nesterov wrote:
> > Or. perhaps we can change wait_woken
> >
> > - set_current_state(mode);
> > + if (mode)
> > + set_current_state(mode);
> >
> >
> > then rfcomm_run() can do
> >
> > for (;;) {
> > rfcomm_process_sessions();
> >
> > set_current_state(TASK_INTERRUPTIBLE);
> > if (kthread_should_stop())
> > break;
> > wait_woken(0);
> > }

> probably this makes more sense in this particular case...

Right, in which case the below needs a different justification, but you
said you were already thinking about it, so there must be something.

And clearly it needs a changelog to begin with :-)

A few nits below.

> --- x/kernel/kthread.c
> +++ x/kernel/kthread.c
> @@ -48,6 +48,7 @@ struct kthread {
>
> enum KTHREAD_BITS {
> KTHREAD_IS_PER_CPU = 0,
> + KTHREAD_WANTS_SIGNAL,
> KTHREAD_SHOULD_STOP,
> KTHREAD_SHOULD_PARK,
> KTHREAD_IS_PARKED,
> @@ -442,6 +443,45 @@ int kthread_park(struct task_struct *k)
> return ret;
> }
>
> +void set_kthread_wants_signal(bool on)
> +{
> + struct kthread *kthread = to_kthread(current);
> + unsigned long flags;
> +
> + spin_lock_irqsave(&current->sighand->siglock, flags);
> + if (on) {
> + set_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags);

All barriers must come with a comment :-)

> + smp_mb__after_atomic();
> + if (kthread_should_stop())
> + set_thread_flag(TIF_SIGPENDING);
> + } else {
> + clear_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags);
> + recalc_sigpending();
> + }
> + spin_unlock_irqrestore(&current->sighand->siglock, flags);
> +}
> +
> +static void kthread_kill(struct task_struct *k, struct kthread *kthread)
> +{
> + smp_mb__before_atomic();

test_bit isn't actually an atomic op so this barrier is 'wrong'. If you
need an MB there smp_mb() it is. Again, comment is missing.

> + if (test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags)) {
> + unsigned long flags;
> + bool kill = true;
> +
> + if (lock_task_sighand(k, &flags)) {

Since we do the double test thing here, with the set side also done
under the lock, so we really need a barrier above?

> + kill = test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags);
> + if (kill)
> + signal_wake_up(k, 0);
> + unlock_task_sighand(k, &flags);
> + }
> +
> + if (kill)
> + return;
> + }
> +
> + wake_up_process(k);
> +}
> +
> /**
> * kthread_stop - stop a thread created by kthread_create().
> * @k: thread created by kthread_create().
> @@ -469,7 +509,7 @@ int kthread_stop(struct task_struct *k)
> if (kthread) {
> set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
> __kthread_unpark(k, kthread);
> - wake_up_process(k);
> + kthread_kill(k, kthread);
> wait_for_completion(&kthread->exited);
> }
> ret = k->exit_code;
>

2014-10-04 08:45:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Fri, Oct 03, 2014 at 07:56:54PM +0200, Oleg Nesterov wrote:
> Or. perhaps we can change wait_woken
>
> - set_current_state(mode);
> + if (mode)
> + set_current_state(mode);
>
>
> then rfcomm_run() can do
>
> for (;;) {
> rfcomm_process_sessions();
>
> set_current_state(TASK_INTERRUPTIBLE);
> if (kthread_should_stop())
> break;
> wait_woken(0);
> }
>
> Or perhaps we can split wait_woken() into 2 helpers,
>
> static inline long wait_woken(wq, mode, timeout)
> {
> set_current_state(mode);
> schedule_woken(wq, timeout); // does the rest
> }
>
> to avoid "mode == 0" hack; rfcomm_run() should use schedule_woken().
>
> What do you think?

Clever, I'm not entirely sure which I prefer, I think I'm leaning
towards the first one with the !mode hack, but let me sit on that for a
little while.

2014-10-06 00:28:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 03, 2014 at 09:30:29PM +0200, Oleg Nesterov wrote:
> > > Or. perhaps we can change wait_woken
> > >
> > > - set_current_state(mode);
> > > + if (mode)
> > > + set_current_state(mode);
> > >
> > >
> > > then rfcomm_run() can do
> > >
> > > for (;;) {
> > > rfcomm_process_sessions();
> > >
> > > set_current_state(TASK_INTERRUPTIBLE);
> > > if (kthread_should_stop())
> > > break;
> > > wait_woken(0);
> > > }
>
> > probably this makes more sense in this particular case...
>
> Right, in which case the below needs a different justification, but you
> said you were already thinking about it, so there must be something.
>
> And clearly it needs a changelog to begin with :-)

Yes, and the comments ;)

I showed this patch only to complete the discussion, I am not going to
send it now.

But thanks for the review!

> > +static void kthread_kill(struct task_struct *k, struct kthread *kthread)
> > +{
> > + smp_mb__before_atomic();
>
> test_bit isn't actually an atomic op so this barrier is 'wrong'. If you
> need an MB there smp_mb() it is.

Hmm. I specially checked Documentation/memory-barriers.txt,

(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();

These are for use with atomic (such as add, subtract, increment and
decrement) functions that don't return a value, especially when used for
reference counting. These functions do not imply memory barriers.

These are also used for atomic bitop functions that do not return a
value (such as set_bit and clear_bit).
^^^^^^^^^^^^^^^^^^^^^

Either you or memory-barriers.txt should be fixed ;)

> Again, comment is missing.

Yes, yes, we need the comments in set_kthread_wants_signal() and kthread_kill()
to explain that they set/check KTHREAD_WANTS_SIGNAL/KTHREAD_SHOULD_STOP in
opposite order, and we need mb's to separate STORE/LOAD.

And probably set_bit(KTHREAD_SHOULD_STOP) should be moved into kthread_kill()
to make this more clear. (along with __kthread_unpark(), but this reminds me
that __kthread_unpark() should die imho).

>
> > + if (test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags)) {
> > + unsigned long flags;
> > + bool kill = true;
> > +
> > + if (lock_task_sighand(k, &flags)) {
>
> Since we do the double test thing here, with the set side also done
> under the lock, so we really need a barrier above?

Yes, otherwise set_kthread_wants_signal() can miss a signal. And note
that the 2nd check is only needed to ensure that we can not race
with set_kthread_wants_signal(false).

BUT!!! I have to admit that I simply do not know if there is any arch

set_bit(&word, X);
test_bit(&word, Y);

which actually needs mb() in between, the word is the same. Probably
not.

Oleg.

2014-10-06 09:19:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Mon, Oct 06, 2014 at 02:25:09AM +0200, Oleg Nesterov wrote:
> Yes, and the comments ;)
>
> I showed this patch only to complete the discussion, I am not going to
> send it now.

Fair enough :-)

> But thanks for the review!
>
> > > +static void kthread_kill(struct task_struct *k, struct kthread *kthread)
> > > +{
> > > + smp_mb__before_atomic();
> >
> > test_bit isn't actually an atomic op so this barrier is 'wrong'. If you
> > need an MB there smp_mb() it is.
>
> Hmm. I specially checked Documentation/memory-barriers.txt,
>
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> These are for use with atomic (such as add, subtract, increment and
> decrement) functions that don't return a value, especially when used for
> reference counting. These functions do not imply memory barriers.
>
> These are also used for atomic bitop functions that do not return a
> value (such as set_bit and clear_bit).
> ^^^^^^^^^^^^^^^^^^^^^
>
> Either you or memory-barriers.txt should be fixed ;)

Its in there, just not explicitly. All those functions listed are
read-modify-write ops, test_bit() is not, its just a read. But yes I
suppose we could make that more explicit.

Also test_bit() obviously does return a value, so it doesn't fall in the
{set,clear}_bit() class.

Does the change below clarify things?

> > > + if (test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags)) {
> > > + unsigned long flags;
> > > + bool kill = true;
> > > +
> > > + if (lock_task_sighand(k, &flags)) {
> >
> > Since we do the double test thing here, with the set side also done
> > under the lock, so we really need a barrier above?
>
> Yes, otherwise set_kthread_wants_signal() can miss a signal. And note
> that the 2nd check is only needed to ensure that we can not race
> with set_kthread_wants_signal(false).
>
> BUT!!! I have to admit that I simply do not know if there is any arch
>
> set_bit(&word, X);
> test_bit(&word, Y);
>
> which actually needs mb() in between, the word is the same. Probably
> not.

DEC Alpha? Wasn't it the problem there that dependencies didn't actually
work as expected?

Added Paul to Cc.

---
Documentation/memory-barriers.txt | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 22a969cdd476..0d97c99ad957 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1594,12 +1594,9 @@ CPU from reordering them.
(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();

- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting. These functions do not imply memory barriers.
-
- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
+ These are for use with atomic/bitop (r-m-w) functions that don't return
+ a value (eg. atomic_{add,sub,inc,dec}(), {set,clear}_bit()). These
+ functions do not imply memory barriers.

As an example, consider a piece of code that marks an object as being dead
and then decrements the object's reference count:

2014-10-06 10:59:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On Mon, Oct 06, 2014 at 11:19:15AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 06, 2014 at 02:25:09AM +0200, Oleg Nesterov wrote:
> > Yes, and the comments ;)
> >
> > I showed this patch only to complete the discussion, I am not going to
> > send it now.
>
> Fair enough :-)
>
> > But thanks for the review!
> >
> > > > +static void kthread_kill(struct task_struct *k, struct kthread *kthread)
> > > > +{
> > > > + smp_mb__before_atomic();
> > >
> > > test_bit isn't actually an atomic op so this barrier is 'wrong'. If you
> > > need an MB there smp_mb() it is.
> >
> > Hmm. I specially checked Documentation/memory-barriers.txt,
> >
> > (*) smp_mb__before_atomic();
> > (*) smp_mb__after_atomic();
> >
> > These are for use with atomic (such as add, subtract, increment and
> > decrement) functions that don't return a value, especially when used for
> > reference counting. These functions do not imply memory barriers.
> >
> > These are also used for atomic bitop functions that do not return a
> > value (such as set_bit and clear_bit).
> > ^^^^^^^^^^^^^^^^^^^^^
> >
> > Either you or memory-barriers.txt should be fixed ;)
>
> Its in there, just not explicitly. All those functions listed are
> read-modify-write ops, test_bit() is not, its just a read. But yes I
> suppose we could make that more explicit.
>
> Also test_bit() obviously does return a value, so it doesn't fall in the
> {set,clear}_bit() class.
>
> Does the change below clarify things?
>
> > > > + if (test_bit(KTHREAD_WANTS_SIGNAL, &kthread->flags)) {
> > > > + unsigned long flags;
> > > > + bool kill = true;
> > > > +
> > > > + if (lock_task_sighand(k, &flags)) {
> > >
> > > Since we do the double test thing here, with the set side also done
> > > under the lock, so we really need a barrier above?
> >
> > Yes, otherwise set_kthread_wants_signal() can miss a signal. And note
> > that the 2nd check is only needed to ensure that we can not race
> > with set_kthread_wants_signal(false).
> >
> > BUT!!! I have to admit that I simply do not know if there is any arch
> >
> > set_bit(&word, X);
> > test_bit(&word, Y);
> >
> > which actually needs mb() in between, the word is the same. Probably
> > not.
>
> DEC Alpha? Wasn't it the problem there that dependencies didn't actually
> work as expected?

This looks to me to be an issue of cache coherence rather than
dependency ordering, so I would expect that DEC Alpha would respect
the ordering.

Thanx, Paul

> Added Paul to Cc.
>
> ---
> Documentation/memory-barriers.txt | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 22a969cdd476..0d97c99ad957 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1594,12 +1594,9 @@ CPU from reordering them.
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> -
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> + These are for use with atomic/bitop (r-m-w) functions that don't return
> + a value (eg. atomic_{add,sub,inc,dec}(), {set,clear}_bit()). These
> + functions do not imply memory barriers.
>
> As an example, consider a piece of code that marks an object as being dead
> and then decrements the object's reference count:
>

2014-10-06 16:24:37

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfcomm_run] WARNING: CPU: 1 PID: 79 at kernel/sched/core.c:7156 __might_sleep()

On 10/06, Peter Zijlstra wrote:
>
> On Mon, Oct 06, 2014 at 02:25:09AM +0200, Oleg Nesterov wrote:
> >
> > Hmm. I specially checked Documentation/memory-barriers.txt,
> >
> > (*) smp_mb__before_atomic();
> > (*) smp_mb__after_atomic();
> >
> > These are for use with atomic (such as add, subtract, increment and
> > decrement) functions that don't return a value, especially when used for
> > reference counting. These functions do not imply memory barriers.
> >
> > These are also used for atomic bitop functions that do not return a
> > value (such as set_bit and clear_bit).
> > ^^^^^^^^^^^^^^^^^^^^^
> >
> > Either you or memory-barriers.txt should be fixed ;)

Heh.

> Its in there, just not explicitly. All those functions listed are
> read-modify-write ops, test_bit() is not, its just a read.

OOPS! I was hypnotized by "_bit" suffix, I guess.

Of course, of course, test_bit() must be a plain LOAD in any case, can't
understand what I was thinking about.

So in this particular case kthread_kill() needs smp_mb__AFTER_atomic(),
and "after" applies to set_bit(KTHREAD_SHOULD_STOP).

> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1594,12 +1594,9 @@ CPU from reordering them.
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> -
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> + These are for use with atomic/bitop (r-m-w) functions that don't return
> + a value (eg. atomic_{add,sub,inc,dec}(), {set,clear}_bit()). These
> + functions do not imply memory barriers.

Thanks!

Oleg.