Subject: [PATCH] Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.

Commit dec2c92880cc5435381d50e3045ef018a762a917 ("Bluetooth: hci_ldisc:
Use rwlocking to avoid closing proto races") introduced locks in
hci_ldisc that are held while calling the proto functions. These locks
are rwlock's, and hence do not allow sleeping while they are held.
However, the proto functions that hci_bcm registers use mutexes and
hence need to be able to sleep.

In more detail: hci_uart_tty_receive() and hci_uart_dequeue() both
acquire the rwlock, after which they call proto->recv() and
proto->dequeue(), respectively. In the case of hci_bcm these point to
bcm_recv() and bcm_dequeue(). The latter both acquire the
bcm_device_lock, which is a mutex, so doing so results in a call to
might_sleep(). But since we're holding a rwlock in hci_ldisc, that
results in the following BUG (this for the dequeue case - a similar
one for the receive case is omitted for brevity):

BUG: sleeping function called from invalid context at kernel/locking/mutex.c
in_atomic(): 1, irqs_disabled(): 0, pid: 7303, name: kworker/7:3
INFO: lockdep is turned off.
CPU: 7 PID: 7303 Comm: kworker/7:3 Tainted: G W OE 4.13.2+ #17
Hardware name: Apple Inc. MacBookPro13,3/Mac-A5C67F76ED83108C, BIOS MBP133.8
Workqueue: events hci_uart_write_work [hci_uart]
Call Trace:
dump_stack+0x8e/0xd6
___might_sleep+0x164/0x250
__might_sleep+0x4a/0x80
__mutex_lock+0x59/0xa00
? lock_acquire+0xa3/0x1f0
? lock_acquire+0xa3/0x1f0
? hci_uart_write_work+0xd3/0x160 [hci_uart]
mutex_lock_nested+0x1b/0x20
? mutex_lock_nested+0x1b/0x20
bcm_dequeue+0x21/0xc0 [hci_uart]
hci_uart_write_work+0xe6/0x160 [hci_uart]
process_one_work+0x253/0x6a0
worker_thread+0x4d/0x3b0
kthread+0x133/0x150

We can't replace the mutex in hci_bcm, because there are other calls
there that might sleep. Therefore this replaces the rwlock's in
hci_ldisc with rw_semaphore's (which allow sleeping). This is a safer
approach anyway as it reduces the restrictions on the proto callbacks.
Also, because acquiring write-lock is very rare compared to acquiring
the read-lock, the percpu variant of rw_semaphore is used.

Lastly, the read-lock in the hci_uart_tx_wakeup() callback now needed
to be removed because that function is called from an IRQ context. But
since it doesn't actually call any proto functions, instead just
queueing the work, and the other operations are atomic bit operations,
this is ok.

Signed-off-by: Ronald Tschal?r <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Cc: Dean Jenkins <[email protected]>
---
drivers/bluetooth/hci_ldisc.c | 31 +++++++++++++------------------
drivers/bluetooth/hci_uart.h | 2 +-
2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index eec95019f15c..08bcb1239aa0 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -115,12 +115,12 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
struct sk_buff *skb = hu->tx_skb;

if (!skb) {
- read_lock(&hu->proto_lock);
+ percpu_down_read(&hu->proto_lock);

if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
skb = hu->proto->dequeue(hu);

- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);
} else {
hu->tx_skb = NULL;
}
@@ -130,8 +130,6 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)

int hci_uart_tx_wakeup(struct hci_uart *hu)
{
- read_lock(&hu->proto_lock);
-
if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
goto no_schedule;

@@ -145,8 +143,6 @@ int hci_uart_tx_wakeup(struct hci_uart *hu)
schedule_work(&hu->write_work);

no_schedule:
- read_unlock(&hu->proto_lock);
-
return 0;
}
EXPORT_SYMBOL_GPL(hci_uart_tx_wakeup);
@@ -247,12 +243,12 @@ static int hci_uart_flush(struct hci_dev *hdev)
tty_ldisc_flush(tty);
tty_driver_flush_buffer(tty);

- read_lock(&hu->proto_lock);
+ percpu_down_read(&hu->proto_lock);

if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
hu->proto->flush(hu);

- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);

return 0;
}
@@ -275,15 +271,15 @@ static int hci_uart_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
BT_DBG("%s: type %d len %d", hdev->name, hci_skb_pkt_type(skb),
skb->len);

- read_lock(&hu->proto_lock);
+ percpu_down_read(&hu->proto_lock);

if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);
return -EUNATCH;
}

hu->proto->enqueue(hu, skb);
- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);

hci_uart_tx_wakeup(hu);

@@ -486,7 +482,7 @@ static int hci_uart_tty_open(struct tty_struct *tty)
INIT_WORK(&hu->init_ready, hci_uart_init_work);
INIT_WORK(&hu->write_work, hci_uart_write_work);

- rwlock_init(&hu->proto_lock);
+ percpu_init_rwsem(&hu->proto_lock);

/* Flush any pending characters in the driver */
tty_driver_flush_buffer(tty);
@@ -503,7 +499,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
{
struct hci_uart *hu = tty->disc_data;
struct hci_dev *hdev;
- unsigned long flags;

BT_DBG("tty %p", tty);

@@ -520,9 +515,9 @@ static void hci_uart_tty_close(struct tty_struct *tty)
cancel_work_sync(&hu->write_work);

if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
- write_lock_irqsave(&hu->proto_lock, flags);
+ percpu_down_write(&hu->proto_lock);
clear_bit(HCI_UART_PROTO_READY, &hu->flags);
- write_unlock_irqrestore(&hu->proto_lock, flags);
+ percpu_up_write(&hu->proto_lock);

if (hdev) {
if (test_bit(HCI_UART_REGISTERED, &hu->flags))
@@ -582,10 +577,10 @@ static void hci_uart_tty_receive(struct tty_struct *tty, const u8 *data,
if (!hu || tty != hu->tty)
return;

- read_lock(&hu->proto_lock);
+ percpu_down_read(&hu->proto_lock);

if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);
return;
}

@@ -593,7 +588,7 @@ static void hci_uart_tty_receive(struct tty_struct *tty, const u8 *data,
* tty caller
*/
hu->proto->recv(hu, data, count);
- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);

if (hu->hdev)
hu->hdev->stat.byte_rx += count;
diff --git a/drivers/bluetooth/hci_uart.h b/drivers/bluetooth/hci_uart.h
index d9cd95d81149..66e8c68e4607 100644
--- a/drivers/bluetooth/hci_uart.h
+++ b/drivers/bluetooth/hci_uart.h
@@ -87,7 +87,7 @@ struct hci_uart {
struct work_struct write_work;

const struct hci_uart_proto *proto;
- rwlock_t proto_lock; /* Stop work for proto close */
+ struct percpu_rw_semaphore proto_lock; /* Stop work for proto close */
void *priv;

struct sk_buff *tx_skb;
--
2.13.6


2017-10-22 14:57:50

by Dean Jenkins

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.

Hi Ronald,

Sorry for my delay in replying to you.

On 18/10/17 03:00, Life is hard, and then you die wrote:
>
>> In fact, hci_uart_tty_close() is really a bit of a mess because it
>> progressively removes resources. It is is based on old code which has been
>> patched over the many years. Therefore it has kept code which is probably
>> obsolete or duplicated. Ideally, hci_uart_tty_close() should be rewritten.
>>
>> For example, there is a call
>> cancel_work_sync(&hu->write_work);
>> in  hci_uart_tty_close() which at first glance seems to be helpful but it is
>> flawed because hci_uart_tx_wakeup() can schedule a new work item AFTER
>> cancel_work_sync(&hu->write_work) runs. Therefore, locking is needed to
>> prevent hci_uart_tx_wakeup() from being scheduled whilst
>> HCI_UART_PROTO_READY is being cleared in hci_uart_tty_close().
> Actually, I think there's still problem: in hci_uart_tty_close()
> cancel_work_sync() is called before the HCI_UART_PROTO_READY flag is
> cleared, opening the following race:
>
> P1 P2
> cancel_work_sync()
> hci_uart_tx_wakeup()
> clear_bit(HCI_UART_PROTO_READY)
> clean up
> hci_uart_write_work()
Yes, this looks bad. There is some protection in hci_uart_write_work()
because hci_uart_dequeue(hu) checks HCI_UART_PROTO_READY but it will not
be foolproof due to resources being removed by hci_uart_tty_close().
>
> So AFAICT cancel_work_sync() needs to be done after clearing the flag:
>
> if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
> write_lock_irqsave(&hu->proto_lock, flags);
> clear_bit(HCI_UART_PROTO_READY, &hu->flags);
> write_unlock_irqrestore(&hu->proto_lock, flags);
>
> cancel_work_sync(&hu->write_work); // <---
I agree with this solution. I was going to suggest this but you beat me
to it ;-)
>
> if (hdev) {
>
> (if HCI_UART_PROTO_READY is already clear, then no work can be pending,
> so no need to cancel work in that case).
>
I agree with your statement.

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

Subject: Re: [PATCH] Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.


Hi Dean,

apologies for the slow reply, and thank you for the detailed response.

On Mon, Oct 16, 2017 at 06:08:37PM +0100, Dean Jenkins wrote:
>
> On 15/10/17 11:40, =?UTF-8?q?Ronald=20Tschal=C3=A4r?= wrote:
> >Lastly, the read-lock in the hci_uart_tx_wakeup() callback now needed
> >to be removed because that function is called from an IRQ context. But
> >since it doesn't actually call any proto functions, instead just
> >queueing the work, and the other operations are atomic bit operations,
> >this is ok.
[snip]
> There is at least 1 race condition to consider. The atomic variables do not
> help because the codeblock as a whole is not atomic. Hence locking is needed
> as follows.
>
> The issue is that hci_uart_tty_close() runs asynchronously to
> hci_uart_tx_wakeup() which is shown below (assuming a SMP system).
>
> int hci_uart_tx_wakeup(struct hci_uart *hu)
> {
> ??????? read_lock(&hu->proto_lock);
>
> In parallel, hci_uart_tty_close() can be running here:
>
> ??????? if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
> ??????????????? goto no_schedule;
>
> In parallel, hci_uart_tty_close() can be running here:
>
> ??????? if (test_and_set_bit(HCI_UART_SENDING, &hu->tx_state)) {
>
> In parallel, hci_uart_tty_close() can be running here:
>
> ??????????????? set_bit(HCI_UART_TX_WAKEUP, &hu->tx_state);
> ??????????????? goto no_schedule;
> ??????? }
>
> ??????? BT_DBG("");
>
> In parallel, hci_uart_tty_close() can be running here:
>
> ??????? schedule_work(&hu->write_work);
>
> In parallel, hci_uart_tty_close() can be running here:
>
> no_schedule:
> ??????? read_unlock(&hu->proto_lock);
>
> ??????? return 0;
> }
>
> hci_uart_tty_close() progressively removes the resources needed by the
> scheduled work queue hu->write_work which runs hci_uart_write_work(). Also
> there is a delay between the scheduling and hci_uart_write_work actually
> running. This means hci_uart_write_work() can race with
> hci_uart_tty_close(), sometimes causing hci_uart_tty_close() to crash, for
> example due to the write buffer no longer being there.
>
> static void hci_uart_write_work(struct work_struct *work)
> {
> <snipped>
> ??? ??? set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
> ??? ??? len = tty->ops->write(tty, skb->data, skb->len);??? <== can crash
> inside the write function pointer as the write buffer is no longer valid
> ??? ??? hdev->stat.byte_tx += len;??? <== can crash here as hdev now invalid
> <snipped>
> }
>
> Therefore, a robust solution is to lock out hci_uart_tty_close() when
> hci_uart_tx_wakeup() runs, that is the reason why read_lock(&hu->proto_lock)
> is used in hci_uart_write_work(). The atomic flag HCI_UART_PROTO_READY is
> prevented from being cleared whilst hci_uart_tx_wakeup() runs.

Got it, thanks. So to summarize, the core issue is that in
hci_uart_tx_wakeup() these two need to be atomic

??????if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
????????????? goto no_schedule;

??????schedule_work(&hu->write_work);

so that new work won't be scheduled after close.

I think then that this can fixed by using the trylock variants here:

int hci_uart_tx_wakeup(struct hci_uart *hu)
{
- read_lock(&hu->proto_lock);
+ /* This may be called in an IRQ context, so we can't sleep. Therefore
+ * we only try to acquire the lock, and if that fails we assume the
+ * tty is being closed because that is the only time the write lock is
+ * acquired. If, however, at some point in the future the write lock
+ * is also acquired in other situations, then this must be revisited.
+ */
+ if (!percpu_down_read_trylock(&hu->proto_lock))
+ return 0;

if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
goto no_schedule;
@@ -145,8 +143,6 @@ int hci_uart_tx_wakeup(struct hci_uart *hu)
schedule_work(&hu->write_work);

no_schedule:
- read_unlock(&hu->proto_lock);
+ percpu_up_read(&hu->proto_lock);

return 0;
}


> In fact, hci_uart_tty_close() is really a bit of a mess because it
> progressively removes resources. It is is based on old code which has been
> patched over the many years. Therefore it has kept code which is probably
> obsolete or duplicated. Ideally, hci_uart_tty_close() should be rewritten.
>
> For example, there is a call
> cancel_work_sync(&hu->write_work);
> in? hci_uart_tty_close() which at first glance seems to be helpful but it is
> flawed because hci_uart_tx_wakeup() can schedule a new work item AFTER
> cancel_work_sync(&hu->write_work) runs. Therefore, locking is needed to
> prevent hci_uart_tx_wakeup() from being scheduled whilst
> HCI_UART_PROTO_READY is being cleared in hci_uart_tty_close().

Actually, I think there's still problem: in hci_uart_tty_close()
cancel_work_sync() is called before the HCI_UART_PROTO_READY flag is
cleared, opening the following race:

P1 P2
cancel_work_sync()
hci_uart_tx_wakeup()
clear_bit(HCI_UART_PROTO_READY)
clean up
hci_uart_write_work()

So AFAICT cancel_work_sync() needs to be done after clearing the flag:

if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
write_lock_irqsave(&hu->proto_lock, flags);
clear_bit(HCI_UART_PROTO_READY, &hu->flags);
write_unlock_irqrestore(&hu->proto_lock, flags);

cancel_work_sync(&hu->write_work); // <---

if (hdev) {

(if HCI_UART_PROTO_READY is already clear, then no work can be pending,
so no need to cancel work in that case).



Cheers,

Ronald

2017-10-16 17:08:37

by Dean Jenkins

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.

Hi Ronald,

On 15/10/17 11:40, =?UTF-8?q?Ronald=20Tschal=C3=A4r?= wrote:
> Lastly, the read-lock in the hci_uart_tx_wakeup() callback now needed
> to be removed because that function is called from an IRQ context. But
> since it doesn't actually call any proto functions, instead just
> queueing the work, and the other operations are atomic bit operations,
> this is ok.
You are discussing my commits so I'll try to answer your queries. I am a
bit busy but I'll attempt to answer you in a timely manner.

Sorry, I know nothing about bcm, I will explain using BCSP scenarios.

I use SMP ARM based embedded systems in a commercial environment. These
systems can suffer heavy CPU loading which can expose race conditions.
The crashes are very rare in a PC environment.

I will try to explain why the locking is needed in hci_uart_tx_wakeup().

For reference, I am using the HEAD of Linux 4.14-rc4

There is at least 1 race condition to consider. The atomic variables do
not help because the codeblock as a whole is not atomic. Hence locking
is needed as follows.

The issue is that hci_uart_tty_close() runs asynchronously to
hci_uart_tx_wakeup() which is shown below (assuming a SMP system).

int hci_uart_tx_wakeup(struct hci_uart *hu)
{
        read_lock(&hu->proto_lock);

In parallel, hci_uart_tty_close() can be running here:

        if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
                goto no_schedule;

In parallel, hci_uart_tty_close() can be running here:

        if (test_and_set_bit(HCI_UART_SENDING, &hu->tx_state)) {

In parallel, hci_uart_tty_close() can be running here:

                set_bit(HCI_UART_TX_WAKEUP, &hu->tx_state);
                goto no_schedule;
        }

        BT_DBG("");

In parallel, hci_uart_tty_close() can be running here:

        schedule_work(&hu->write_work);

In parallel, hci_uart_tty_close() can be running here:

no_schedule:
        read_unlock(&hu->proto_lock);

        return 0;
}

hci_uart_tty_close() progressively removes the resources needed by the
scheduled work queue hu->write_work which runs hci_uart_write_work().
Also there is a delay between the scheduling and hci_uart_write_work
actually running. This means hci_uart_write_work() can race with
hci_uart_tty_close(), sometimes causing hci_uart_tty_close() to crash,
for example due to the write buffer no longer being there.

static void hci_uart_write_work(struct work_struct *work)
{
<snipped>
        set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
        len = tty->ops->write(tty, skb->data, skb->len);    <== can
crash inside the write function pointer as the write buffer is no longer
valid
        hdev->stat.byte_tx += len;    <== can crash here as hdev now
invalid
<snipped>
}

Therefore, a robust solution is to lock out hci_uart_tty_close() when
hci_uart_tx_wakeup() runs, that is the reason why
read_lock(&hu->proto_lock) is used in hci_uart_write_work(). The atomic
flag HCI_UART_PROTO_READY is prevented from being cleared whilst
hci_uart_tx_wakeup() runs.

In fact, hci_uart_tty_close() is really a bit of a mess because it
progressively removes resources. It is is based on old code which has
been patched over the many years. Therefore it has kept code which is
probably obsolete or duplicated. Ideally, hci_uart_tty_close() should be
rewritten.

For example, there is a call
cancel_work_sync(&hu->write_work);
in  hci_uart_tty_close() which at first glance seems to be helpful but
it is flawed because hci_uart_tx_wakeup() can schedule a new work item
AFTER cancel_work_sync(&hu->write_work) runs. Therefore, locking is
needed to prevent hci_uart_tx_wakeup() from being scheduled whilst
HCI_UART_PROTO_READY is being cleared in hci_uart_tty_close().

In the case of BCSP, hci_bscp.c calls hci_uart_tx_wakeup() to schedule
the transmission of BCSP frames. This includes BCSP retransmissions.
Therefore, the scenario where a BCSP retransmission is being scheduled
whilst hci_uart_tty_close() runs needs to be protected to avoid crashes.
BCSP has a re-transmission timer which can expire whilst
hci_uart_tty_close() runs.

In older kernels without my commits, when BCSP attempts to transmit
during the execution of hci_uart_tty_close(), it can result in crashes
as outlined above. The flag HCI_UART_PROTO_READY now prevents
transmission and error messages can sometimes be seen from
hci_uart_send_frame() because transmission is prevented.

It is possible that my commits might have been too robust so perhaps
some simplification is possible.

Feel free to send me further questions about my commits. I will try to
answer your queries within a couple of days each time.

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.